2021
DOI: 10.48550/arxiv.2106.14282
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Closer Look at How Fine-tuning Changes BERT

Abstract: Given the prevalence of pre-trained contextualized representations in today's NLP, there have been several efforts to understand what information such representations contain. A common strategy to use such representations is to fine-tune them for an end task. However, how fine-tuning for a task changes the underlying space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. Our experiments reveal that fine-tuning improves … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…This idea is echoed in recent machine learning literature, which has shown that it is possible to quickly adapt large pre-trained networks to a broad range of downstream tasks of interest via “fine-tuning” paradigms (Brown et al, 2020; Radford et al, 2019; Reid et al, 2022). As the name suggests, fine-tuning induces only small changes in the network representations (Zhou and Srikumar, 2021), suggesting that representations in the network can be quickly “re-associated” with new functionality for the downstream tasks.…”
Section: Discussionmentioning
confidence: 99%
“…This idea is echoed in recent machine learning literature, which has shown that it is possible to quickly adapt large pre-trained networks to a broad range of downstream tasks of interest via “fine-tuning” paradigms (Brown et al, 2020; Radford et al, 2019; Reid et al, 2022). As the name suggests, fine-tuning induces only small changes in the network representations (Zhou and Srikumar, 2021), suggesting that representations in the network can be quickly “re-associated” with new functionality for the downstream tasks.…”
Section: Discussionmentioning
confidence: 99%
“…Additional output layers are added to the model, each specifically tailored for NLP tasks. This phase is devoted to optimizing BERT's broad language interpretation for tasks, necessitating smaller, more focused datasets [56,57].…”
Section: Bidirectional Encoder Representations From Transformers (Bert)mentioning
confidence: 99%
“…Linear Probing. Most of the works on updating pre-trained models have been mainly studied for language tasks (Dodge et al, 2020;Zhao et al, 2021;Zhou & Srikumar, 2021). In general computer vision setting, transfer learning and updating methods gained much attention (Zhai et al, 2019;Kornblith et al, 2019;Ericsson et al, 2021; Data Augmentation in FSL.…”
Section: Related Workmentioning
confidence: 99%