2021
DOI: 10.48550/arxiv.2106.03164
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 15 publications
0
14
0
Order By: Relevance
“…There are two series of related works: invasive methods and non-invasive methods. Invasive methods, which are built on a strong assumption that the inner structure (e.g., self-attention and feedforward layers) of the PLM can be modified, includes Prefix-Tuning (Li and Liang, 2021), Bitfit (Ben Zaken et al, 2021), Child-Tuning , P-Tuning v2 (Liu et al, 2021b), LoRA (Hu et al, 2021, UnifiedSKG (Xie et al, 2022) and Adapter-based models (Rebuffi et al, 2017;Houlsby et al, 2019;Lin et al, 2020;He et al, 2021;Pfeiffer et al, 2021). Non-invasive methods, which only modify input embeddings and regard the inner structure as a black box, mostly are prompting methods (including our Input-Tuning).…”
Section: Related Workmentioning
confidence: 99%
“…There are two series of related works: invasive methods and non-invasive methods. Invasive methods, which are built on a strong assumption that the inner structure (e.g., self-attention and feedforward layers) of the PLM can be modified, includes Prefix-Tuning (Li and Liang, 2021), Bitfit (Ben Zaken et al, 2021), Child-Tuning , P-Tuning v2 (Liu et al, 2021b), LoRA (Hu et al, 2021, UnifiedSKG (Xie et al, 2022) and Adapter-based models (Rebuffi et al, 2017;Houlsby et al, 2019;Lin et al, 2020;He et al, 2021;Pfeiffer et al, 2021). Non-invasive methods, which only modify input embeddings and regard the inner structure as a black box, mostly are prompting methods (including our Input-Tuning).…”
Section: Related Workmentioning
confidence: 99%
“…Because models are more prone to overfitting in low-data regimes, we hypothesize that, with more parameters available to the model, full fine-tuning and prefix tuning with special token embeddings overfit the training data and produce worse results on test data. Furthermore, according to He et al [8], the lower layers of a PLM capture more generic features while upper layers capture more task-specific features. Therefore, as the lowest layer in a PLM, if the special token embeddings in the embedding layer come to overfit the training data, the model will not be able to generalize well to unseen data.…”
Section: Low Data Settingmentioning
confidence: 99%
“…The adapter's effectiveness was demonstrated over 26 diverse text classification tasks, where near full fine-tuning performance was achieved with only 3.6% of additional parameters per task. Later, He et al [4] showed that the adapter FT can mitigate forgetting issues of full FT because of the smaller weight deviations. LoRA [6] is another example of adapter-based LFT where augmentation is implemented with a different design of small weight modules.…”
Section: 21mentioning
confidence: 99%