A Closer Look at How Fine-tuning Changes BERT

Zhou, Yichu; Srikumar, Vivek

doi:10.18653/v1/2022.acl-long.75

Cited by 21 publications

(24 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is done by adding a classification layer on top of the pretrained model with output neurons for the different classes (e.g., populist and non-populist paragraphs), without the need for the intermediate step of encoding the documents themselves in vector form (hence the absence of the horizontal bar in the fourth diagram of Figure 1). Using human-annotated data, the model is then trained for a few additional epochs, during which the model parameters are adapted via gradient descent to specialize in the classification task at hand (Zhou and Srikumar 2021). Metaphorically speaking, rather than teaching a model how to speak English and how to identify, say, populism at the same time—as would have been the case had we relied on raw word frequency vectors or locally trained embeddings—we are teaching a model that already speaks English how to identify populism.…”

Section: Measuring Political Frames In Textsmentioning

confidence: 99%

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models

Bonikowski

Luo²,

Stuhler³

2022

Sociological Methods & Research

View full text Add to dashboard Cite

Radical-right campaigns commonly employ three discursive elements: anti-elite populism, exclusionary and declinist nationalism, and authoritarianism. Recent scholarship has explored whether these frames have diffused from radical-right to centrist parties in the latter’s effort to compete for the former’s voters. This study instead investigates whether similar frames had been used by mainstream political actors prior to their exploitation by the radical right (in the U.S., Donald Trump’s 2016 and 2020 campaigns). To do so, we identify instances of populism, nationalism (i.e., exclusionary and inclusive definitions of national symbolic boundaries and displays of low and high national pride), and authoritarianism in the speeches of Democratic and Republican presidential nominees between 1952 and 2020. These frames are subtle, infrequent, and polysemic, which makes their measurement difficult. We overcome this by leveraging the affordances of neural language models—in particular, a robustly optimized variant of bidirectional encoder representations from Transformers (RoBERTa) and active learning. As we demonstrate, this approach is more effective for measuring discursive frames than other methods commonly used by social scientists. Our results suggest that what set Donald Trump’s campaign apart from those of mainstream presidential candidates was not the invention of a new form of politics, but the combination of negative evaluations of elites, low national pride, and authoritarianism—all of which had long been present among both parties—with an explicit evocation of exclusionary nationalism, which had been articulated only implicitly by prior presidential nominees. Radical-right discourse—at least at the presidential level in the United States—should therefore be characterized not as a break with the past but as an amplification and creative rearrangement of existing political-cultural tropes.

show abstract

Section: Measuring Political Frames In Textsmentioning

confidence: 99%

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models

Bonikowski

Luo²,

Stuhler³

2022

Sociological Methods & Research

View full text Add to dashboard Cite

show abstract

“…Fine-tuning a pretrained language model for an end task is a widely used strategy for quickly and efficiently building a model for that task with limited labeled data. Zhou and Srikumar (2021) find that fine-tuning reconfigures underlying semantic space to adjust pretrained representations to downstream tasks. In view of this, we take sentence-level textual stimuli of cognitive data as input data for a specific fine-tuned model to obtain representations that contain information specific to that task.…”

Section: Task-specific Sentence Representationsmentioning

confidence: 90%

CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP

Luo¹,

Xu²,

Xiong³

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Is there a principle to guide transfer learning across tasks in natural language processing (NLP)? Taxonomy (Zamir et al., 2018) finds that a structure exists among visual tasks, as a principle underlying transfer learning for them. In this paper, we propose a cognitively inspired framework, CogTaskonomy, to learn taxonomy for NLP tasks. The framework consists of Cognitive Representation Analytics (CRA) and Cognitive-Neural Mapping (CNM). The former employs Representational Similarity Analysis, which is commonly used in computational neuroscience to find a correlation between brainactivity measurement and computational modeling, to estimate task similarity with taskspecific sentence representations. The latter learns to detect task relations by projecting neural representations from NLP models to cognitive signals (i.e., fMRI voxels). Experiments on 12 NLP tasks, where BERT/TinyBERT are used as the underlying models for transfer learning, demonstrate that the proposed Cog-Taskonomy is able to guide transfer learning, achieving performance competitive to the Analytic Hierarchy Process (Saaty, 1987) used in visual Taskonomy (Zamir et al., 2018) but without requiring exhaustive pairwise O(m 2 ) task transferring. Analyses further discover that CNM is capable of learning modelagnostic task taxonomy. The source code is available at https://github.com/ tjunlp-lab/CogTaskonomy.git.

show abstract

“…This observation suggests that the embedding space changes the most during the initial fine-tuning batch updates, which is consistent with findings from Zhou and Srikumar. 10 Second, the magnitude of change in the topological structure is greater in later layers (e.g. layers 9 and 12) than in earlier ones (e.g.…”

Section: Organization and Evolution Of Embeddings During Fine-tuningmentioning

confidence: 96%

“…[74][75][76] For example, Hewitt and Manning 77 showed that syntactic dependency relationships can be recovered from the BERT embeddings by a simple linear transformation, and Ethayarajh 78 showed that the vectors in the embeddings occupy a narrow cone in the embedding space. Fine-tuning a model for a specific task is a common practice, but there are limited insights 10,[79][80][81][82] into the process of fine-tuning. Specifically, few studies have attempted to understand how fine-tuning affects the model parameters and internal embeddings.…”

Section: Probing Embeddings In Nlpmentioning

confidence: 99%

See 1 more Smart Citation

TopoBERT: Exploring the topology of fine-tuned word representations

Rathore

Zhou

Srikumar

et al. 2023

Information Visualization

View full text Add to dashboard Cite

Transformer-based language models such as BERT and its variants have found widespread use in natural language processing (NLP). A common way of using these models is to fine-tune them to improve their performance on a specific task. However, it is currently unclear how the fine-tuning process affects the underlying structure of the word embeddings from these models. We present TopoBERT, a visual analytics system for interactively exploring the fine-tuning process of various transformer-based models – across multiple fine-tuning batch updates, subsequent layers of the model, and different NLP tasks – from a topological perspective. The system uses the mapper algorithm from topological data analysis (TDA) to generate a graph that approximates the shape of a model’s embedding space for an input dataset. TopoBERT enables its users (e.g. experts in NLP and linguistics) to (1) interactively explore the fine-tuning process across different model-task pairs, (2) visualize the shape of embedding spaces at multiple scales and layers, and (3) connect linguistic and contextual information about the input dataset with the topology of the embedding space. Using TopoBERT, we provide various use cases to exemplify its applications in exploring fine-tuned word embeddings. We further demonstrate the utility of TopoBERT, which enables users to generate insights about the fine-tuning process and provides support for empirical validation of these insights.

show abstract

A Closer Look at How Fine-tuning Changes BERT

Cited by 21 publications

References 44 publications

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models

Politics as Usual? Measuring Populism, Nationalism, and Authoritarianism in U.S. Presidential Campaigns (1952–2020) with Neural Language Models

CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP

TopoBERT: Exploring the topology of fine-tuned word representations

Contact Info

Product

Resources

About