2021
DOI: 10.48550/arxiv.2110.04366
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards a Unified View of Parameter-Efficient Transfer Learning

Abstract: Fine-tuning large pretrained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches finetune all the parameters of the pretrained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(55 citation statements)
references
References 26 publications
3
47
0
Order By: Relevance
“…Although each of these three approaches has its own focus, the central idea is to keep the pre-trained parameters constant while training lightweight alternatives to achieve adaptation for downstream tasks. There have also been some recent attempts to grasp the internal connection of these strategies and build a unified parameter-efficient tuning framework [333,334].…”
Section: Parameter-efficient Tuningmentioning
confidence: 99%
“…Although each of these three approaches has its own focus, the central idea is to keep the pre-trained parameters constant while training lightweight alternatives to achieve adaptation for downstream tasks. There have also been some recent attempts to grasp the internal connection of these strategies and build a unified parameter-efficient tuning framework [333,334].…”
Section: Parameter-efficient Tuningmentioning
confidence: 99%
“…Moreover, we show that our method can be used in tandem with several parameter-efficient methods (He et al, 2021) in order to make the increase in time and space complexity due to skill-specific parameters negligible. In particular, we explore sparse adaptation with Lottery-Ticket Sparse Fine-Tuning (LT-SFT; Ansell et al, 2022) and low-rank adaptation with Low-Rank Adapters (LoRA;Hu et al, 2021).…”
Section: Fine-grained Skill Selectionmentioning
confidence: 99%
“…Adapter: Freeze the pre-trained model and train a residual Adapter (Houlsby et al, 2019). ParallelAdapter: A variant by transferring the parallel insertion of prefix tuning into adapters (He et al, 2021). prompt-tuning (CLS/VER): which only tunes soft-prompts with a frozen language model (Lester et al, 2021), prompt for transformer's first layer.…”
Section: Baseline Modelsmentioning
confidence: 99%
“…The level of catastrophic forgetting in EANN (Wang et al, 2018) is somewhat reduced compared to Fine-tuning but is still severe. Prompt-tuning and p-tuning v2 are somewhat related to the adapter method in the form of parameter tuning (He et al, 2021), but their performance in CL differs. The prompt-based model is better than the adapter in both datasets.…”
Section: Main Experimentsmentioning
confidence: 99%