Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems 2006
DOI: 10.1145/1142351.1142374
|View full text |Cite
|
Sign up to set email alerts
|

Achieving anonymity via clustering

Abstract: Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code [15]. Sweeney [16] proposed the k-anonymit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
212
0
2

Year Published

2006
2006
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 219 publications
(220 citation statements)
references
References 18 publications
6
212
0
2
Order By: Relevance
“…Our research is also related to the work of Aggarwal et al [4] who proposed a new model of data anonymization based on clustering. While they develop several polynomial-time approximation algorithms, their basic modeling idea is-roughly-to cluster the rows of the input matrix and then to publish the "cluster centers"; importantly, it is required that each cluster contains at least k rows, and this corresponds to the k-anonymity concept.…”
Section: Inputmentioning
confidence: 82%
“…Our research is also related to the work of Aggarwal et al [4] who proposed a new model of data anonymization based on clustering. While they develop several polynomial-time approximation algorithms, their basic modeling idea is-roughly-to cluster the rows of the input matrix and then to publish the "cluster centers"; importantly, it is required that each cluster contains at least k rows, and this corresponds to the k-anonymity concept.…”
Section: Inputmentioning
confidence: 82%
“…Given a set C of n points on the plane an r-gatherclustering is a partition of the points into clusters such that each cluster has at least r points. The r-gatherclustering problem [1] finds an r-gather-clustering minimizing the maximum radius among the clusters, where the radius of a cluster is the minimum radius of the disk which can cover the points in the cluster. A polynomial time 2-approximation algorithm for the problem is known [1].…”
Section: R-gather Clusteringmentioning
confidence: 99%
“…Most of the local recoding generalization algorithms follow clustering based approach where each cluster should satisfy anonymity requirement [1,2,6,10,14,19,28]. [2] Proposed condensation based approach where the data is condensed into multiple groups having pre-defined size.…”
Section: Related Workmentioning
confidence: 99%
“…However, the main limitation of this approach is, it produces high information loss because large numbers of records were merged into a single group. Gagan Aggrawal et al proposed r-gather clustering for anonymity where the data records are partitioned into clusters and release the cluster centres, along with their size, radius, and a set of associated sensitive values [14]. Grigorious et al addressed sampling based clustering for balancing the data utility and privacy protection.…”
Section: Related Workmentioning
confidence: 99%