Thee Chanyaswad scite author profile

Over the last decade, differential privacy (DP) has emerged as the gold standard of a rigorous and provable privacy framework. However, there are very few practical guidelines on how to apply differential privacy in practice, and a key challenge is how to set an appropriate value for the privacy parameter ɛ. In this work, we employ a statistical tool called hypothesis testing for discovering useful and interpretable guidelines for the state-of-the-art privacy-preserving frameworks. We formalize and implement hypothesis testing in terms of an adversary’s capability to infer mutually exclusive sensitive information about the input data (such as whether an individual has participated or not) from the output of the privacy-preserving mechanism. We quantify the success of the hypothesis testing using the precision- recall-relation, which provides an interpretable and natural guideline for practitioners and researchers on selecting ɛ. Our key results include a quantitative analysis of how hypothesis testing can guide the choice of the privacy parameter ɛ in an interpretable manner for a differentially private mechanism and its variants. Importantly, our findings show that an adversary’s auxiliary information - in the form of prior distribution of the database and correlation across records and time - indeed influences the proper choice of ɛ. Finally, we also show how the perspective of hypothesis testing can provide useful insights on the relationships among a broad range of privacy frameworks including differential privacy, Pufferfish privacy, Blowfish privacy, dependent differential privacy, inferential privacy, membership privacy and mutual-information based differential privacy.

MVG Mechanism

Dytso

Poor

et al. 2018

Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves (ϵ, δ )-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrixvalued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism can notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline. CCS CONCEPTS• Security and privacy → Privacy protections; KEYWORDS differential privacy; matrix-valued query; matrix-variate Gaussian; directional noise; MVG mechanism ACM Reference Format: Thee Chanyaswad, Alex Dytso, H.

Collaborative PCA/DCA Learning Methods for Compressive Privacy

Kung

ACM Trans. Embed. Comput. Syst.

Chang

et al. 2017

In the Internet era, the data being collected on consumers like us are growing exponentially, and attacks on our privacy are becoming a real threat. To better ensure our privacy, it is safer to let the data owner control the data to be uploaded to the network as opposed to taking chance with data servers or third parties. To this end, we propose compressive privacy , a privacy-preserving technique to enable the data creator to compress data via collaborative learning so that the compressed data uploaded onto the Internet will be useful only for the intended utility and not be easily diverted to malicious applications. For data in a high-dimensional feature vector space, a common approach to data compression is dimension reduction or, equivalently, subspace projection. The most prominent tool is principal component analysis (PCA). For unsupervised learning, PCA can best recover the original data given a specific reduced dimensionality. However, for the supervised learning environment, it is more effective to adopt a supervised PCA, known as discriminant component analysis (DCA), to maximize the discriminant capability. The DCA subspace analysis embraces two different subspaces. The signal-subspace components of DCA are associated with the discriminant distance/power (related to the classification effectiveness), whereas the noise subspace components of DCA are tightly coupled with recoverability and/or privacy protection. This article presents three DCA-related data compression methods useful for privacy-preserving applications: — Utility-driven DCA : Because the rank of the signal subspace is limited by the number of classes, DCA can effectively support classification using a relatively small dimensionality (i.e., high compression). — Desensitized PCA : By incorporating a signal-subspace ridge into DCA, it leads to a variant especially effective for extracting privacy-preserving components. In this case, the eigenvalues of the noise-space are made to become insensitive to the privacy labels and are ordered according to their corresponding component powers. — Desensitized K-means/SOM : Since the revelation of the K-means or SOM cluster structure could leak sensitive information, it is safer to perform K-means or SOM clustering on a desensitized PCA subspace.

Discriminant-component eigenfaces for privacy-preserving face recognition

Chang

Mittal

et al. 2016

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Liu

Mittal

2018

A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong -differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications -clustering, classification, and regression -on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for realworld deployment of privacy-preserving data release.