Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.378
|View full text |Cite
|
Sign up to set email alerts
|

Parameter-Efficient Transfer Learning with Diff Pruning

Abstract: The large size of pretrained networks makes them difficult to deploy for multiple tasks in storage-constrained settings. Diff pruning enables parameter-efficient transfer learning that scales well with new tasks. The approach learns a task-specific "diff" vector that extends the original pretrained parameters. This diff vector is adaptively pruned during training with a differentiable approximation to the L 0 -norm penalty to encourage sparsity. As the number of tasks increases, diff pruning remains parameter-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
71
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 89 publications
(98 citation statements)
references
References 38 publications
0
71
0
Order By: Relevance
“…Alternatively, some methods fix the entire PLM and introduce a small number of new trainable parameters. Notable examples in this category include adapter-tuning (Houlsby et al, 2019) and its extensions (Pfeiffer et al, 2021;, prefix-tuning (Li and Liang, 2021) and its extensions (Lester et al, 2021), and additive methods (Zhang et al, 2020;Guo et al, 2021;Hu et al, 2021).…”
Section: Pelt Methods W/ Additional Parametersmentioning
confidence: 99%
See 4 more Smart Citations
“…Alternatively, some methods fix the entire PLM and introduce a small number of new trainable parameters. Notable examples in this category include adapter-tuning (Houlsby et al, 2019) and its extensions (Pfeiffer et al, 2021;, prefix-tuning (Li and Liang, 2021) and its extensions (Lester et al, 2021), and additive methods (Zhang et al, 2020;Guo et al, 2021;Hu et al, 2021).…”
Section: Pelt Methods W/ Additional Parametersmentioning
confidence: 99%
“…Additive PELT methods treat the model parameters after fine-tuning as an addition of the pre-trained parameters θ pre-trained and task-specific differences δ task , where θ pre-trained is fixed and a new (sub)set of model parameters are added on top (θ task = θ pre-trained + δ task ). There are various ways to parameterize the task-specific differences δ task , leading to different additive methods such as LoRA (Hu et al, 2021), diff pruning (Guo et al, 2021), and side-tuning (Zhang et al, 2020). We plan to incorporate additive methods into UNIPELT in the next version.…”
Section: Pelt Methods W/ Additional Parametersmentioning
confidence: 99%
See 3 more Smart Citations