Achieving anonymity via clustering

Aggarwal, Gagan; Feder, Tomás; Kenthapadi, Krishnaram; Khuller, Samir; Panigrahy, Rina; Thomas, Dilys; An, Zhijie

doi:10.1145/1142351.1142374

Cited by 219 publications

(220 citation statements)

References 18 publications

Supporting

Mentioning

212

Contrasting

Unclassified

Order By: Relevance

“…Our research is also related to the work of Aggarwal et al [4] who proposed a new model of data anonymization based on clustering. While they develop several polynomial-time approximation algorithms, their basic modeling idea is-roughly-to cluster the rows of the input matrix and then to publish the "cluster centers"; importantly, it is required that each cluster contains at least k rows, and this corresponds to the k-anonymity concept.…”

Section: Inputmentioning

confidence: 82%

Using Patterns to Form Homogeneous Teams

et al. 2013

View full text Add to dashboard Cite

Homogeneous team formation is the task of grouping individuals into teams, each of which consists of members who fulfill the same set of prespecified properties. In this theoretical work, we propose, motivate, and analyze a combinatorial model where, given a matrix over a finite alphabet whose rows correspond to individuals and columns correspond to attributes of individuals, the user specifies lower and upper bounds on team sizes as well as combinations of attributes that have to be homogeneous (that is, identical) for all members of the corresponding teams. Furthermore, the user can define a cost for assigning any individual to a certain team. We show that some special cases of our new model lead to NP-hard problems while others allow for (fixed-parameter) tractability results. For example, the problem is already That version concentrates on the anonymization aspects of the model. In our new version we slightly extend our model and show how it applies to (homogeneous) clustering of individuals, that is, to homogeneous team formation. Indeed, we now claim that the models and ideas better fit with these applications than with the previous data anonymization motivation. Apart from full proofs omitted in the extended abstract and also adapting our old ideas to the new extended model, the current article also contains a new and easier proof of NP-hardness, a new proof for showing that polynomial-time data reduction in term of so-called polynomial-size problem kernels is unlikely to exist with respect to certain parameterizations, and a new algorithm for the (still NP-hard) special case ignoring costs. Many of the new findings are part of the diploma thesis [18] NP-hard even if (i) there are no lower and upper bounds on the team sizes, (ii) all costs are zero, and (iii) the matrix has only two columns. In contrast, the problem becomes fixed-parameter tractable for the combined parameter "number of possible teams" and "number of different individuals", the latter being upper-bounded by the number of rows.

show abstract

Section: Inputmentioning

confidence: 82%

Using Patterns to Form Homogeneous Teams

et al. 2013

View full text Add to dashboard Cite

show abstract

“…Given a set C of n points on the plane an r-gatherclustering is a partition of the points into clusters such that each cluster has at least r points. The r-gatherclustering problem [1] finds an r-gather-clustering minimizing the maximum radius among the clusters, where the radius of a cluster is the minimum radius of the disk which can cover the points in the cluster. A polynomial time 2-approximation algorithm for the problem is known [1].…”

Section: R-gather Clusteringmentioning

confidence: 99%

On <i>r</i>-Gatherings on the Line

Akagi

Nakano

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn this paper we study a recently proposed variant of the facility location problem, called the r-gathering problem. Given an integer r, a set C of customers, a set F of facilities, and a connecting cost co(c, f ) for each pair of c ∈ C and f ∈ F, an r-gathering of customers C to facilities F is an assignment A of C to open facilities F ⊆ F such that at least r customers are assigned to each open facility. We give an algorithm to find an r-gathering with the minimum cost, where the cost is max c∈C {co(c, A(c))}, when all C and F are on the real line.

show abstract

“…Most of the local recoding generalization algorithms follow clustering based approach where each cluster should satisfy anonymity requirement [1,2,6,10,14,19,28]. [2] Proposed condensation based approach where the data is condensed into multiple groups having pre-defined size.…”

Section: Related Workmentioning

confidence: 99%

“…However, the main limitation of this approach is, it produces high information loss because large numbers of records were merged into a single group. Gagan Aggrawal et al proposed r-gather clustering for anonymity where the data records are partitioned into clusters and release the cluster centres, along with their size, radius, and a set of associated sensitive values [14]. Grigorious et al addressed sampling based clustering for balancing the data utility and privacy protection.…”

Section: Related Workmentioning

confidence: 99%