What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Ettinger, Allyson

doi:10.1162/tacl_a_00298

Cited by 442 publications

(446 citation statements)

References 32 publications

Supporting

Mentioning

429

Contrasting

Unclassified

Order By: Relevance

“…At a high level, mechanisms for interpreting BERT can be categorized into three main categories: interpreting the learned embeddings [1,16,23,49,86], BERT's learned knowledge of syntax [27,30,32,45,47,76], and BERT's learned knowledge of semantics [24,76].…”

Section: Interpreting Models In Nlpmentioning

confidence: 99%

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Vig

Madani

Varshney

et al. 2020

Preprint

188

169

View full text Add to dashboard Cite

Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. Through the lens of attention, we analyze the inner workings of the Transformer and explore how the model discerns structural and functional properties of proteins. We show that attention (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We also present a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with known biological processes and provide a tool to aid discovery in protein engineering and synthetic biology. The code for visualization and analysis is available at https://github.com/salesforce/provis.

show abstract

Section: Interpreting Models In Nlpmentioning

confidence: 99%

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Vig

Madani

Varshney

et al. 2020

Preprint

188

169

View full text Add to dashboard Cite

show abstract

“…These high performance levels typically come at the cost of decreased interpretability. Such neural nets are notoriously prone to learning irrelevant correlations (Ettinger, 2020; Futrell et al, 2019; Kuncoro et al, 2018; van Schijndel, Mueller, & Linzen, 2019). To avoid this problem and focus our investigation more squarely on structural constraints like locality in Grodner and Gibson (2005) and non‐structural factors such as animacy in Traxler et al (2002), we instead proceed with an explicit grammar whose generalization ability rests upon well‐chosen syntactic analyses.…”

Section: From Grammar To Processing Difficulty Predictionsmentioning

confidence: 99%

Quantifying Structural and Non‐structural Expectations in Relative Clause Processing

Chen

Hale

2021

Cognitive Science

View full text Add to dashboard Cite

Information‐theoretic complexity metrics, such as Surprisal (Hale, 2001; Levy, 2008) and Entropy Reduction (Hale, 2003), are linking hypotheses that bridge theorized expectations about sentences and observed processing difficulty in comprehension. These expectations can be viewed as syntactic derivations constrained by a grammar. However, this expectation‐based view is not limited to syntactic information alone. The present study combines structural and non‐structural information in unified models of word‐by‐word sentence processing difficulty. Using probabilistic minimalist grammars (Stabler, 1997), we extend expectation‐based models to include frequency information about noun phrase animacy. Entropy reductions derived from these grammars faithfully reflect the asymmetry between subject and object relatives (Staub, 2010; Staub, Dillon, & Clifton, 2017), as well as the effect of animacy on the measured difficulty profile (Lowder & Gordon, 2012; Traxler, Morris, & Seely, 2002). Visualizing probability distributions on the remaining alternatives at particular parser states allows us to explore new, linguistically plausible interpretations for the observed processing asymmetries, including the way that expectations about the relativized argument influence the processing of particular types of relative clauses (Wagers & Pendleton, 2016).

show abstract

“…However, it is now known that these models lack reasoning capabilities, often simply exploiting statistical artifacts in the data sets, instead of actually understanding language (Niven and Kao, 2019;McCoy et al, 2019). Moreover, Ettinger (2020) found that the popular BERT model (Devlin et al, 2019) completely failed to acquire a general understanding of negation. Related, Bender and Koller (2020) contend that meaning cannot be learned from form alone, and argue for approaches that focus on grounding the language (communication) in the real world.…”

Section: Introductionmentioning

confidence: 99%

Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT

Noord¹,

Toral²,

Bos³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We combine character-level and contextual language model representations to improve performance on Discourse Representation Structure parsing. Character representations can easily be added in a sequence-to-sequence model in either one encoder or as a fully separate encoder, with improvements that are robust to different language models, languages and data sets. For English, these improvements are larger than adding individual sources of linguistic information or adding non-contextual embeddings. A new method of analysis based on semantic tags demonstrates that the character-level representations improve performance across a subset of selected semantic phenomena.

show abstract

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Cited by 442 publications

References 32 publications

BERTology Meets Biology: Interpreting Attention in Protein Language Models

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Quantifying Structural and Non‐structural Expectations in Relative Clause Processing

Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT

Contact Info

Product

Resources

About