Preserving privacy and utility during data publishing and data mining is essential for individuals, data providers and researchers. However, studies in this area typically assume that one individual has only one record in a dataset, which is unrealistic in many applications. Having multiple records for an individual leads to new privacy leakages. We call such a dataset a 1:M dataset. In this paper, we propose a novel privacy model called (k, l)-diversity that addresses disclosure risks in 1:M data publishing. Based on this model, we develop an efficient algorithm named 1:M-Generalization to preserve privacy and data utility, and compare it with alternative approaches. Extensive experiments on real-world data show that our approach outperforms the state-of-the-art technique, in terms of data utility and computational cost.
Graph embedding maps a graph into low-dimensional vectors, i.e., embedding matrix, while preserving the graph structure, solving the high computation and space cost for graph analysis. Matrix factorization (MF) is an effective means to achieve graph embedding since maintaining the utility of the graph structure. The personalized graph structure features implied in the embedding matrix can identify the individual, which potentially breaches individual sensitive information in the original graph. Currently, protecting individual privacy without compromising the utility is the key to sharing the embedding matrix. Differential privacy is a gold standard for publishing sensitive information while protecting privacy. The existing methods on differentially private MF, however, cannot be directly incorporated onto MF-based graph embedding as they undergo either high global sensitivity or iterative noise error accumulation, potentially rendering poor utility of MF-based graph embedding. To address the deficiency, this study proposes PPGD, a differentially private perturbed gradient descent method for MF-based graph embedding matrix sharing. Specifically, a Lipschitz condition on the objective function of the MF and a gradient clipping strategy are devised for bounding global sensitivity. Along the way, a scalable solution to global sensitivity that is independent on the original dataset is proposed. Further, a composite noise added means in the gradient descent is designed to guarantee privacy while enhancing the utility. The theoretical analysis shows that PPGD can generate processed embedding matrix with the utility maximization while achieving (ε, δ)differential privacy. The experimental evaluations confirm the effectiveness and efficiency of PPGD.
Distributed clustering is an effect method for solving the problem of clustering data located at different sites. Considering the circumstance that data is horizontally distributed, algorithm LDBDC (local density based distributed clustering) is presented based on the existeding algorithm DBDC (density based distributed clustering), which can easily fit datasets of high dimension and abnormal distribution by adopting ideas such as local density-based clustering and density attractor. Theoretical analysis and experimental results show that algorithm LDBDC outperforms DBDC and SDBDC (scalable density-based distributed clustering) in both clustering quality and efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.