Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.467
|View full text |Cite
|
Sign up to set email alerts
|

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
85
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 138 publications
(99 citation statements)
references
References 39 publications
4
85
1
Order By: Relevance
“…task/transfer learning, often improves over standard single-task learning (Ruder, 2017). Within multitask learning, several works (e.g., Luong et al, 2016;Liu et al, 2019b;Raffel et al, 2020) (Pruksachatkun et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…task/transfer learning, often improves over standard single-task learning (Ruder, 2017). Within multitask learning, several works (e.g., Luong et al, 2016;Liu et al, 2019b;Raffel et al, 2020) (Pruksachatkun et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…As an example, Phang et al (2018) show that downstream accuracy can benefit from an intermediate fine-tuning task, but leave the investigation of why certain tasks benefit from intermediate task training to future work. Recently, Pruksachatkun et al (2020) extended this approach using eleven diverse intermediate fine-tuning tasks. They view probing task performance after finetuning as an indicator of the acquisition of a particular language skill during intermediate task finetuning.…”
Section: Related Workmentioning
confidence: 99%
“…Note that none of the individual tasks in XTREME covers all 40 languages, but much smaller language subsets.3 We leave an even more general analysis that combines transfer both across tasks(Pruksachatkun et al, 2020;Glavaš and Vulić, 2020) and across languages for future work.…”
mentioning
confidence: 99%