Parameter-Efficient Transfer Learning with Diff Pruning

Guo, Demi; Rushton, Gérard; Kim, Yoon

doi:10.18653/v1/2021.acl-long.378

Cited by 89 publications

(98 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alternatively, some methods fix the entire PLM and introduce a small number of new trainable parameters. Notable examples in this category include adapter-tuning (Houlsby et al, 2019) and its extensions (Pfeiffer et al, 2021;, prefix-tuning (Li and Liang, 2021) and its extensions (Lester et al, 2021), and additive methods (Zhang et al, 2020;Guo et al, 2021;Hu et al, 2021).…”

Section: Pelt Methods W/ Additional Parametersmentioning

confidence: 99%

“…Additive PELT methods treat the model parameters after fine-tuning as an addition of the pre-trained parameters θ pre-trained and task-specific differences δ task , where θ pre-trained is fixed and a new (sub)set of model parameters are added on top (θ task = θ pre-trained + δ task ). There are various ways to parameterize the task-specific differences δ task , leading to different additive methods such as LoRA (Hu et al, 2021), diff pruning (Guo et al, 2021), and side-tuning (Zhang et al, 2020). We plan to incorporate additive methods into UNIPELT in the next version.…”

Section: Pelt Methods W/ Additional Parametersmentioning

confidence: 99%

“…Adapter-tuning & extensions 0.047% ~3.6% Prefix-tuning & extensions 0.01% ~2% BitFit (Ben Zaken et al, 2021) 0.01% ~0.09% Diff pruning (Guo et al, 2021) 0.5% LoRA (Hu et al, 2021) 0.01% UNIPELT 0.17%+0.81%=0.98%…”

Section: Effectiveness Of Unipeltmentioning

confidence: 99%

“…One line of work proposes to only tune a small subset of the parameters such as the top layers or the bias terms (Ben Zaken et al, 2021). Other studies take a step further by freezing the entire PLM and adding a small number of additional trainable parameters (Houlsby et al, 2019;Li and Liang, 2021;Lester et al, 2021;Guo et al, 2021;Hu et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

“…Existing PELT research generally aims at achieving performance comparable to conventional finetuning with as few trainable parameters as possible, which has seen significant progress -the task-specific trainable parameters used in most recent approaches (Lester et al, 2021;Guo et al, 2021) are almost negligible compared to the total parameters of the PLM (<1%). A more challenging yet barely studied problem is whether one can achieve better performance than fine-tuning with fewer parameters.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Mao¹,

Mathias²,

Hou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Conventional fine-tuning of pre-trained language models tunes all model parameters and stores a full model copy for each downstream task, which has become increasingly infeasible as the model size grows larger. Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when the training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-growing number of new PELT methods and downstream tasks. In light of model diversity and the difficulty of model selection, we propose a unified framework, UNIPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup. Remarkably, on the GLUE benchmark, UNIPELT consistently achieves 1~3pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups. Moreover, UNIPELT often surpasses the upper bound when taking the best performance of all its submodules used individually on each task, indicating that a mixture of multiple PELT methods may be inherently more effective than single methods. 1 * Work was done during internship at Facebook AI. 1 Work in progress.

show abstract

Section: Pelt Methods W/ Additional Parametersmentioning

confidence: 99%

Section: Pelt Methods W/ Additional Parametersmentioning

confidence: 99%

Section: Effectiveness Of Unipeltmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations