Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.75
|View full text |Cite
|
Sign up to set email alerts
|

A Closer Look at How Fine-tuning Changes BERT

Abstract: Given the prevalence of pre-trained contextualized representations in today's NLP, there have been many efforts to understand what information they contain, and why they seem to be universally successful. The most common approach to use these representations involves fine-tuning them for an end task. Yet, how fine-tuning changes the underlying embedding space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. We hypothesi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(24 citation statements)
references
References 44 publications
3
13
0
Order By: Relevance
“…This is done by adding a classification layer on top of the pretrained model with output neurons for the different classes (e.g., populist and non-populist paragraphs), without the need for the intermediate step of encoding the documents themselves in vector form (hence the absence of the horizontal bar in the fourth diagram of Figure 1). Using human-annotated data, the model is then trained for a few additional epochs, during which the model parameters are adapted via gradient descent to specialize in the classification task at hand (Zhou and Srikumar 2021). Metaphorically speaking, rather than teaching a model how to speak English and how to identify, say, populism at the same time—as would have been the case had we relied on raw word frequency vectors or locally trained embeddings—we are teaching a model that already speaks English how to identify populism.…”
Section: Measuring Political Frames In Textsmentioning
confidence: 99%
“…This is done by adding a classification layer on top of the pretrained model with output neurons for the different classes (e.g., populist and non-populist paragraphs), without the need for the intermediate step of encoding the documents themselves in vector form (hence the absence of the horizontal bar in the fourth diagram of Figure 1). Using human-annotated data, the model is then trained for a few additional epochs, during which the model parameters are adapted via gradient descent to specialize in the classification task at hand (Zhou and Srikumar 2021). Metaphorically speaking, rather than teaching a model how to speak English and how to identify, say, populism at the same time—as would have been the case had we relied on raw word frequency vectors or locally trained embeddings—we are teaching a model that already speaks English how to identify populism.…”
Section: Measuring Political Frames In Textsmentioning
confidence: 99%
“…Fine-tuning a pretrained language model for an end task is a widely used strategy for quickly and efficiently building a model for that task with limited labeled data. Zhou and Srikumar (2021) find that fine-tuning reconfigures underlying semantic space to adjust pretrained representations to downstream tasks. In view of this, we take sentence-level textual stimuli of cognitive data as input data for a specific fine-tuned model to obtain representations that contain information specific to that task.…”
Section: Task-specific Sentence Representationsmentioning
confidence: 90%
“…This observation suggests that the embedding space changes the most during the initial fine-tuning batch updates, which is consistent with findings from Zhou and Srikumar. 10 Second, the magnitude of change in the topological structure is greater in later layers (e.g. layers 9 and 12) than in earlier ones (e.g.…”
Section: Organization and Evolution Of Embeddings During Fine-tuningmentioning
confidence: 96%
“…[74][75][76] For example, Hewitt and Manning 77 showed that syntactic dependency relationships can be recovered from the BERT embeddings by a simple linear transformation, and Ethayarajh 78 showed that the vectors in the embeddings occupy a narrow cone in the embedding space. Fine-tuning a model for a specific task is a common practice, but there are limited insights 10,[79][80][81][82] into the process of fine-tuning. Specifically, few studies have attempted to understand how fine-tuning affects the model parameters and internal embeddings.…”
Section: Probing Embeddings In Nlpmentioning
confidence: 99%
See 1 more Smart Citation