2021
DOI: 10.48550/arxiv.2110.06500
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Differentially Private Fine-tuning of Language Models

Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(27 citation statements)
references
References 40 publications
1
26
0
Order By: Relevance
“…Training data privacy can be protected using the differential privacy (DP) framework (Dwork et al, 2006), which ensures that the effect of any single training example on the trained model is not too large. Yu et al (2021); Li et al (2022) demonstrate the practicality of training differentially private LMs. However, the privacy guarantees achieved by differentially private training are weaker than reported when datasets contain duplicated records.…”
Section: Discussionmentioning
confidence: 99%
“…Training data privacy can be protected using the differential privacy (DP) framework (Dwork et al, 2006), which ensures that the effect of any single training example on the trained model is not too large. Yu et al (2021); Li et al (2022) demonstrate the practicality of training differentially private LMs. However, the privacy guarantees achieved by differentially private training are weaker than reported when datasets contain duplicated records.…”
Section: Discussionmentioning
confidence: 99%
“…Most work on Differential Privacy [6,10,22,24,32,44,47,53,53,57] uses public data either for: generic pre-training unrelated to the task [49], or to tune parameters [25,58,60], or as additional unlabeled data [37,38]. Instead, we use a small amount of labeled public data related to the task to improve the accuracy of a private model under a given privacy parameter .…”
Section: Related Workmentioning
confidence: 99%
“…DP for Language models. [34,57] show that a large language model pre-trained on generic public data can be finetuned on task-specific private data with only modest loss in accuracy. In contrast, our focus is on using small amounts of public data for fine-tuning.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To address these privacy concerns, there is a growing body of literature that aims to create privacypreserving language models [64,2,56,98,84,40,79]. While humans navigate the complexities of language and privacy by identifying appropriate contexts for sharing information, LMs are not currently designed to do this [14,72,66,49,66,50,41].…”
Section: Introductionmentioning
confidence: 99%