Utility-based anonymization using local recoding

Xu, Jian; Wang, Wei; Pei, Jian; Wang, Xiaoyuan; Shi, Bao‐Sen; Fu, Ada Wai-Chee

doi:10.1145/1150402.1150504

Cited by 326 publications

(318 citation statements)

References 11 publications

Supporting

Mentioning

302

Contrasting

Unclassified

Order By: Relevance

“…To capture data utility, some criteria measure the utility loss that is incurred by generalization based on generalization hierarchies, such as Discernability Measure (DM) [1], Utility Measure (UM) [17], Relative Error (RE) [18], Normalized Certainty Penalty (NCP) [16] etc. DM and RE is calculated based on the number of generalized group and suppressed group that overlap with the original data.…”

Section: Utility Loss Measuresmentioning

confidence: 99%

“…A subspace that contains at least k points forms a k-anonymous group [18]. The main idea of clustering-based anonymization is to create clusters containing at least k records in each cluster separately [16]. Fung et al [19] presented an effective top-down approach by introducing multiple virtual identifiers for utilizing information and privacy-guided specialization.…”

Section: Utility Enhancement Supervision Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Utility Enhancement for Privacy Preserving Health Data Publishing

Zaı̈ane

2013

Advanced Data Mining and Applications

View full text Add to dashboard Cite

Abstract. In the medical field, we are amassing phenomenal amounts of data. This data is imperative in discovering patterns and trends to help improve healthcare. Yet the researchers cannot rejoice as the data cannot be easily shared, because health data custodians have the understandable ethical and legal responsibility to maintain the privacy of individuals. Many techniques of anonymization have been proposed to provide means of publishing data for research purposes without jeopardizing privacy. However, as flaws are discovered in these techniques, other more stringent methods are proposed. The strictness of the techniques is putting in question the utility of the data after severe anonymization. In this paper, we investigate several rigorous anonymization techniques with classification to evaluate the utility loss, and propose a framework to enhance the utility of anonymized data.

show abstract

Section: Utility Loss Measuresmentioning

confidence: 99%

Section: Utility Enhancement Supervision Frameworkmentioning

confidence: 99%

Utility Enhancement for Privacy Preserving Health Data Publishing

Zaı̈ane

2013

Advanced Data Mining and Applications

View full text Add to dashboard Cite

show abstract

“…Even if the identifying attributes like name is removed, an attacker may be able to associate records with specific persons using combinations of other attributes (e.g., Postal Code; Gender; birth-date), called quasi-identifiers (QID suppression [5]. Generalization replaces their actual QID values with more general ones.…”

Section: Related Workmentioning

confidence: 99%

“…The Classification Metric (CM) [10] is suitable when the purpose of the anonymized data is to train a classifier, whereas the Discernibility Metric (DM) [9] measures the cardinality of the anonymized groups. More accurate is the Generalized Loss Metric [10] and the similar Normalized Certainty Penalty (NCP) [5]. In the case of categorical attributes NCP is defined with respect to the hierarchy.…”

Section: B Information Lossmentioning

confidence: 99%

See 1 more Smart Citation

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

Reddy¹,

Raju²,

Kumari³

2012

IJMLC

View full text Add to dashboard Cite

Abstract-In this paper we study the problem of protecting privacy in the publication of transactional data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the anonymity guarantee using concept learning. Our anonymization model relies on generalization using concept hierarchy and concept learning. The proposed algorithms are experimentally evaluated using real world datasets.

show abstract

SaC‐FRAPP: a scalable and cost‐effective framework for privacy preservation over big data on cloud

Zhang

Liu

Nepal

et al. 2013

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARY Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to the current information technology industry and research communities while posing significant challenges on them as well. Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever increasing data sets in big data applications. However, processing or sharing privacy‐sensitive data sets on cloud probably engenders severe privacy concerns because of multi‐tenancy. Data encryption and anonymization are two widely‐adopted ways to combat privacy breach. However, encryption is not suitable for data that are processed and shared frequently, and anonymizing big data and manage numerous anonymized data sets are still challenges for traditional anonymization approaches. As such, we propose a scalable and cost‐effective framework for privacy preservation over big data on cloud in this paper. The key idea of the framework is that it leverages cloud‐based MapReduce to conduct data anonymization and manage anonymous data sets, before releasing data to others. The framework provides a holistic conceptual foundation for privacy preservation over big data. Further, a corresponding proof‐of‐concept prototype system is implemented. Empirical evaluations demonstrate that scalable and cost‐effective framework for privacy preservation can anonymize large‐scale data sets and mange anonymous data sets in a highly flexible, scalable, efficient, and cost‐effective fashion. Copyright © 2013 John Wiley & Sons, Ltd.

show abstract

Utility-based anonymization using local recoding

Cited by 326 publications

References 11 publications

Utility Enhancement for Privacy Preserving Health Data Publishing

Utility Enhancement for Privacy Preserving Health Data Publishing

Personalized Privacy Preserving Publication of Transactional Datasets Using Concept Learning

SaC‐FRAPP: a scalable and cost‐effective framework for privacy preservation over big data on cloud

Contact Info

Product

Resources

About