Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions.
Federated learning (FL) is an emerging paradigm for distributed training of large-scale deep neural networks in which participants' data remains on their own devices with only model updates being shared with a central server. However, the distributed nature of FL gives rise to new threats caused by potentially malicious participants. In this paper, we study targeted data poisoning attacks against FL systems in which a malicious subset of the participants aim to poison the global model by sending model updates derived from mislabeled data. We first demonstrate that such data poisoning attacks can cause substantial drops in classification accuracy and recall, even with a small percentage of malicious participants. We additionally show that the attacks can be targeted, i.e., they have a large negative impact only on classes that are under attack. We also study attack longevity in early/late round training, the impact of malicious participant availability, and the relationships between the two. Finally, we propose a defense strategy that can help identify malicious participants in FL to circumvent poisoning attacks, and demonstrate its effectiveness.
Deep learning techniques based on neural networks have shown significant success in a wide range of AI tasks. Large-scale training datasets are one of the critical factors for their success. However, when the training datasets are crowdsourced from individuals and contain sensitive information, the model parameters may encode private information and bear the risks of privacy leakage. The recent growing trend of the sharing and publishing of pre-trained models further aggravates such privacy risks. To tackle this problem, we propose a differentially private approach for training neural networks. Our approach includes several new techniques for optimizing both privacy loss and model accuracy. We employ a generalization of differential privacy called concentrated differential privacy(CDP), with both a formal and refined privacy loss analysis on two different data batching methods. We implement a dynamic privacy budget allocator over the course of training to improve model accuracy. Extensive experiments demonstrate that our approach effectively improves privacy loss accounting, training efficiency and model quality under a given privacy budget.
Membership inference attacks seek to infer membership of individual training instances of a model to which an adversary has black-box access through a machine learning-as-a-service API. In providing an in-depth characterization of membership privacy risks against machine learning models, this paper presents a comprehensive study towards demystifying membership inference attacks from two complimentary perspectives. First, we provide a generalized formulation of the development of a black-box membership inference attack model. Second, we characterize the importance of model choice on model vulnerability through a systematic evaluation of a variety of machine learning models and model combinations using multiple datasets. Through formal analysis and empirical evidence from extensive experimentation, we characterize under what conditions a model may be vulnerable to such black-box membership inference attacks. We show that membership inference vulnerability is data-driven and corresponding attack models are largely transferable. Though different model types display different vulnerabilities to membership inference, so do different datasets. Our empirical results additionally show that (1) using the type of target model under attack within the attack model may not increase attack effectiveness and (2) collaborative learning exposes vulnerabilities to membership inference risks when the adversary is a participant. We also discuss countermeasure and mitigation strategies.
No abstract
Differentially private location trace synthesis (DPLTS) has recently emerged as a solution to protect mobile users' privacy while enabling the analysis and sharing of their location traces. A key challenge in DPLTS is to best preserve the utility in location trace datasets, which is non-trivial considering the high dimensionality, complexity and heterogeneity of datasets, as well as the diverse types and notions of utility. In this paper, we present OptaTrace: a utility-optimized and targeted approach to DPLTS. Given a real trace dataset D, the differential privacy parameter ε controlling the strength of privacy protection, and the utility/error metric Err of interest; OptaTrace uses Bayesian optimization to optimize DPLTS such that the output error (measured in terms of given metric Err) is minimized while εdifferential privacy is satisfied. In addition, OptaTrace introduces a utility module that contains several built-in error metrics for utility benchmarking and for choosing Err, as well as a frontend web interface for accessible and interactive DPLTS service. Experiments show that OptaTrace's optimized output can yield substantial utility improvement and error reduction compared to previous work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.