Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs

Malekzadeh, Mohammad; Borovykh, Anastasia; Gündüz, Deniz

doi:10.1145/3460120.3484533

Cited by 22 publications

(10 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The attackers can then obtain the node embedding matrix from the data holder through the rouge provider. This attack scenario is in line with the malicious machine learning provider scenario discussed by Song et al [65] and Malekzadeh et al [47].…”

Section: Attack Scenariossupporting

confidence: 80%

“…They cannot interact with the node embedding models since such pipelines usually operate in one direction. For instance, the data holder may have integrated with the malicious machine learning solution providers (i.e., MLaaS providers) from the AWS Marketplace [47,65], or the data holder is part of a vertical federated learning environment in an enterprise [71]. In both cases, the node embeddings are part of the learning process and can be obtained by either the malicious MLaaS providers [47,65] or the insiders [71] in the pipeline.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Finding MNEMON: Reviving Memories of Node Embeddings

Shen¹,

Han²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.

show abstract

Section: Attack Scenariossupporting

confidence: 80%

Section: Introductionmentioning

confidence: 99%

Finding MNEMON: Reviving Memories of Node Embeddings

Shen¹,

Han²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Attribute inference attack is closer to the problem of data amputation where given non-sensitive attributes, the goal is the predict the sensitive attribute [11,14,15,41]. Malekzadeh et al [29] leverage an informational theoretic view to infer the sensitive attributes from output predictions. However, their setting is different where they consider a malicious model designer who injects sensitive attribute to be inferred later after deployment.…”

Section: Related Workmentioning

confidence: 99%

Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute Inference Attacks

Aalmoes¹,

Duddu²,

Boutet³

2022

Preprint

View full text Add to dashboard Cite

Machine learning (ML) models have been deployed for high-stakes applications (e.g., criminal justice system). Due to class imbalance in the sensitive attribute observed in the datasets, ML models are unfair on minority subgroups identified by a sensitive attribute, such as Race and Sex. Fairness algorithms, specially in-processing algorithms, ensure model predictions are independent of sensitive attribute for fair classification across different subgroups (e.g., male and female; white and non-white). Furthermore, ML models are vulnerable to attribute inference attacks where an adversary can identify the values of sensitive attribute by exploiting their distinguishable model predictions. Despite privacy and fairness being important pillars of trustworthy ML, the privacy risk introduced by fairness algorithms with respect to attribute leakage has not been studied. In addition to different fairness metrics, we identify attribute inference attacks as an effective measure for auditing blackbox fairness algorithms to enable model builder to account for privacy and fairness in the model design. More precisely, we proposed Dikaios, a privacy auditing tool for fairness algorithms which leveraged a new effective attribute inference attack that account for the class imbalance in sensitive attributes through an adaptive prediction threshold. Dikaios can be used by model builders to estimate the attribute privacy risks of their model with or without the sensitive attribute in model training. We exhaustively evaluated Dikaios to perform a privacy audit of two in-processing group fairness algorithms (i.e., reductions and adversarial debiasing) over five datasets. First, we show that our attribute inference attack with adaptive prediction threshold significantly outperforms prior attacks, and second, we highlighted the limitations of in-processing fairness algorithms to ensure indistinguishable predictions across different values of sensitive attributes. Indeed, the attribute privacy risk of these in-processing fairness schemes is highly variable according to the proportion of the sensitive attributes in the dataset. This unpredictable effect of fairness mechanisms on the attribute privacy risk can be an important limitation on their utilization which has to be accounted by the model builder.

show abstract

“…Furthermore, given black-box access to a language model's pre-train and finetune stages, Zanella-Béguelin et al (2020) showed that sensitive sequences of the fine-tuning dataset can be extracted. For the distributed client-server setup, Malekzadeh et al (2021) considered the sensitive attribute leakage from the server side with honest-but-curious (HBC) classifiers.…”

Section: Related Workmentioning

confidence: 99%

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Li¹,

Song²,

Li³

2022

Preprint

View full text Add to dashboard Cite

Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve language models' powerful generation ability.

show abstract

Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs

Cited by 22 publications

References 59 publications

Finding MNEMON: Reviving Memories of Node Embeddings

Finding MNEMON: Reviving Memories of Node Embeddings

Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute Inference Attacks

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Contact Info

Product

Resources

About