Xuebin Ren scite author profile

High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively. Index Terms-local differential privacy, high-dimensional data, crowdsourced data, data publication, private data release X. Ren, S. Yang, and X. Yang are with Xi'an Jiaotong University ({xuebinren, shusenyang, yxyphd}@mail.xjtu.edu.cn). C.-M. Yu is with

show abstract

Locally Private High-Dimensional Crowdsourced Data Release Based on Copula Functions

Wang

Yang

Ren

et al. 2022

IEEE Trans. Serv. Comput.

View full text Add to dashboard Cite

Local Differential Privacy for data collection and analysis

Wang

Zhao

et al. 2021

Neurocomputing

View full text Add to dashboard Cite

High-Dimensional Crowdsourced Data Distribution Estimation with Local Privacy

Ren

et al. 2016

View full text Add to dashboard Cite

Abstract-High-dimensional crowdsourced data collected from numerous users produces rich knowledge for our society. However, it also brings unprecedented privacy threats to the participants. Local privacy, a variant of differential privacy, is proposed to eliminate privacy concerns. Unfortunately, achieving local privacy on high-dimensional crowdsourced data raises great challenges in terms of both computational efficiency and effectiveness. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms that maintain local privacy. Then, we develop a Locally privacy-preserving high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, both correlations and joint distributions among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus achieving both efficiency and effectiveness in high-dimensional data publication. To the best of our knowledge, this is the first work addressing high-dimensional crowdsourced data publication with local privacy. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed, and confirm that our LoPub scheme can keep average 80% and 60% accuracy over the published approximate datasets in terms of SVM and random forest classification, respectively.

show abstract

Copula-Based Multi-Dimensional Crowdsourced Data Synthesis and Release with Local Privacy

Yang

Wang

Ren

et al. 2017

View full text Add to dashboard Cite

Impact of Prior Knowledge and Data Correlation on Privacy Leakage: A Unified Analysis

Ren

Yang

et al. 2019

IEEE Trans.Inform.Forensic Secur.

View full text Add to dashboard Cite

It has been widely understood that differential privacy (DP) can guarantee rigorous privacy against adversaries with arbitrary prior knowledge. However, recent studies demonstrate that this may not be true for correlated data, and indicate that three factors could influence privacy leakage: the data correlation pattern, prior knowledge of adversaries, and sensitivity of the query function. This poses a fundamental problem: what is the mathematical relationship between the three factors and privacy leakage? In this paper, we present a unified analysis of this problem. A new privacy definition, named prior differential privacy (PDP), is proposed to evaluate privacy leakage considering the exact prior knowledge possessed by the adversary. We use two models, the weighted hierarchical graph (WHG) and the multivariate Gaussian model to analyze discrete and continuous data, respectively. We demonstrate that positive, negative, and hybrid correlations have distinct impacts on privacy leakage. Considering general correlations, a closedform expression of privacy leakage is derived for continuous data, and a chain rule is presented for discrete data. Our results are valid for general linear queries, including count, sum, mean, and histogram. Numerical experiments are presented to verify our theoretical analysis.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xuebin Ren

Survey on Improving Data Utility in Differentially Private Sequential Data Publishing

Latent Dirichlet Allocation Model Training With Differential Privacy

<inline-formula> <tex-math notation="LaTeX">$\textsf{LoPub}$ </tex-math> </inline-formula>: High-Dimensional Crowdsourced Data Publication With Local Differential Privacy

Locally Private High-Dimensional Crowdsourced Data Release Based on Copula Functions

Local Differential Privacy for data collection and analysis

High-Dimensional Crowdsourced Data Distribution Estimation with Local Privacy

Copula-Based Multi-Dimensional Crowdsourced Data Synthesis and Release with Local Privacy

Impact of Prior Knowledge and Data Correlation on Privacy Leakage: A Unified Analysis

Contact Info

Product

Resources

About