Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2022
DOI: 10.18653/v1/2022.acl-short.1
|View full text |Cite
|
Sign up to set email alerts
|

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Abstract: We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
103
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 197 publications
(169 citation statements)
references
References 17 publications
4
103
0
Order By: Relevance
“…Fine-tuning of large-scale language models (LMs) to get specialized models for specific tasks is known to be the best practice for optimizing task performance (Devlin et al, 2019;Aribandi et al, 2022) but is achieved at the significant cost of training and serving specialized models for many tasks. This motivates recent research on parameterefficient tuning (Houlsby et al, 2019;Li and Liang, 2021;Ben Zaken et al, 2022), which focuses on tuning specialized models by updating a small number of their parameters. Yet, those specialized models fail to benefit from knowledge transfer across many tasks and leverage rich cross-task data (Liu…”
Section: Introductionmentioning
confidence: 92%
See 2 more Smart Citations
“…Fine-tuning of large-scale language models (LMs) to get specialized models for specific tasks is known to be the best practice for optimizing task performance (Devlin et al, 2019;Aribandi et al, 2022) but is achieved at the significant cost of training and serving specialized models for many tasks. This motivates recent research on parameterefficient tuning (Houlsby et al, 2019;Li and Liang, 2021;Ben Zaken et al, 2022), which focuses on tuning specialized models by updating a small number of their parameters. Yet, those specialized models fail to benefit from knowledge transfer across many tasks and leverage rich cross-task data (Liu…”
Section: Introductionmentioning
confidence: 92%
“…Parameter-efficient transfer learning. In addition to the approaches discussed in the previous sections (Houlsby et al, 2019;Ben Zaken et al, 2022;Li and Liang, 2021;Lester et al, 2021;Vu et al, 2022), many parameter-efficient transfer approaches have been introduced recently. Adapter-Fusion…”
Section: Additional Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Structured pruning for finetuning specifically has seen various new findings. Ben-Zaken et al (2021) propose Bias-terms Fine-tuning (BitFit) which freezes all pre-trained weights aside from bias terms for finetuning which results in diff masks with less than 0.1% of original parameters. Since it does not introduce any new parameters or stochastic gates, this method is very simple to implement while almost reaching the performance of DiffPruning with BERT large on the GLUE benchmark.…”
Section: Parameter-efficient Learningmentioning
confidence: 99%
“…Recent works on parameter-efficient (PE) 1 finetuning address this issue by introducing methods that alternatively rely on only changing a tiny set of extra parameters (Houlsby et al, 2019;Li and Liang, 2021;Hambardzumyan et al, 2021;Lester et al, 2021;Hu et al, 2022;He et al, 2022) or a small fraction of the existing model's parameters (Zaken et al, 2021;Gheini et al, 2021). These methods have been shown to be competitive with full fine-tuning despite modifying only as little as 0.01% of all the parameters (Liu et al, 2022).…”
Section: Introductionmentioning
confidence: 99%