Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1213
|View full text |Cite
|
Sign up to set email alerts
|

An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models

Abstract: A growing number of state-of-the-art transfer learning methods employ language models pretrained on large generic corpora. In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting. Specifically, we combine the task-specific optimization function with an auxiliary language model objective, which is adjusted during the training process. This preserves language regularities captured by language models, while enabling sufficient a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
67
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(71 citation statements)
references
References 27 publications
1
67
0
Order By: Relevance
“…Interest in learning general-purpose representations for natural language through unsupervised, multi-task and transfer learning has been skyrocketing lately Radford et al, 2018;McCann et al, 2018;Chronopoulou et al, 2019;Phang et al, 2018;. In parallel to our work, studies that focus on generalization have appeared on publication servers, empirically studying generalization to multiple tasks (Yogatama et al, 2019;Liu et al, 2019).…”
Section: Related Workmentioning
confidence: 74%
“…Interest in learning general-purpose representations for natural language through unsupervised, multi-task and transfer learning has been skyrocketing lately Radford et al, 2018;McCann et al, 2018;Chronopoulou et al, 2019;Phang et al, 2018;. In parallel to our work, studies that focus on generalization have appeared on publication servers, empirically studying generalization to multiple tasks (Yogatama et al, 2019;Liu et al, 2019).…”
Section: Related Workmentioning
confidence: 74%
“…Howard and Ruder (2018) proposed to fine-tune the pre-trained LM with sentences from the downstream dataset and showed that it boosts the performance of the downstream task. Chronopoulou et al (2019) also demonstrated the effectiveness of the fine-tuning method.…”
Section: Language Model Fine-tuningmentioning
confidence: 84%
“…Specifically, we have the following auxiliary tasks: Masked Language Model Since the pretraining is usually preformed on the corpus with restricted domains, it is expected that further pretraining on more diverse domains may improve the generalization capability. Hence, we add an auxiliary task, masked language model (Chronopoulou et al, 2019), in the fine-tuning stage, along with the MRC task. Moreover, we use three corpus with different domains as the input for masked language model: (1) the passages in MRQA in-domain datasets that include wikipedia, news and search snippets; (2) the search snippets from Bing 6 .…”
Section: Fine-tuning Mrc Models With Multi-task Learningmentioning
confidence: 99%
“…Hence, we incorporate masked language model by using corpus from various domains as an auxiliary task in the fine-tuning phase, along with MRC. The side effect of adding a language modeling objective to MRC is that it can avoid catastrophic forgetting and keep the most useful features learned from pretraining task (Chronopoulou et al, 2019). Additionally, we explore multi-task learning (Liu et al, 2019) by incorporating the supervised dataset from other NLP tasks (e.g.…”
Section: Introductionmentioning
confidence: 99%