Neural language models as psycholinguistic subjects: Representations of syntactic state

Futrell, Richard; Wilcox, Ethan; Morita, Takashi; Qian, Peng; Ballesteros, Miguel; Lévy, Roger

doi:10.18653/v1/n19-1004

Cited by 127 publications

(131 citation statements)

References 42 publications

Supporting

Mentioning

105

Contrasting

Unclassified

Order By: Relevance

“…Some previous work does use targeted tests to examine specific capacities of LMs-often inspired by psycholinguistic methods. However, the majority of this work has focused on syntactic capabilities of LMs (Linzen et al, 2016;Gulordava et al, 2018;Marvin and Linzen, 2018;Wilcox et al, 2018;Futrell et al, 2019). Relevant to our case study here, using several of these tests Goldberg (2019) shows the BERT model to perform impressively on such syntactic diagnostics.…”

Section: Introductionmentioning

confidence: 68%

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Ettinger

2020

Transactions of the Association for Computational Linguistics

442

357

View full text Add to dashboard Cite

Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pretraining processes confer upon models. In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can generally distinguish good from bad completions involving shared category or role reversal, albeit with less sensitivity than humans, and it robustly retrieves noun hypernyms, but it struggles with challenging inference and role-based event predictionand in particular, it shows clear insensitivity to the contextual impacts of negation.

show abstract

Section: Introductionmentioning

confidence: 68%

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Ettinger

2020

Transactions of the Association for Computational Linguistics

442

357

View full text Add to dashboard Cite

show abstract

“…Answering this question is important both for technical outcomes-models with explicit hierarchical structure show performance gains, at least when training on relatively small datasets (Choe and Charniak, 2016;-and for the scientific aim of understanding what biases, learning objectives and training regimes led to humanlike linguistic knowledge. Previous work has approached this question by either examining models' internal state (Weiss et al, 2018;Mareček and Rosa, 2018) or by studying model behavior (Elman, 1991;Linzen et al, 2016;Futrell et al, 2019;McCoy et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Representation in Neural Language Models: Suppression and Recovery of Expectations

Wilcox

Lévy

Futrell

2019

Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

Deep learning sequence models have led to a marked increase in performance for a range of Natural Language Processing tasks, but it remains an open question whether they are able to induce proper hierarchical generalizations for representing natural language from linear input alone. Work using artificial languages as training input has shown that LSTMs are capable of inducing the stack-like data structures required to represent context-free and certain mildly context-sensitive languages (Weiss et al., 2018)-formal language classes which correspond in theory to the hierarchical structures of natural language. Here we present a suite of experiments probing whether neural language models trained on linguistic data induce these stack-like data structures and deploy them while incrementally predicting words. We study two natural language phenomena: center embedding sentences and syntactic island constraints on the filler-gap dependency. In order to properly predict words in these structures, a model must be able to temporarily suppress certain expectations and then recover those expectations later, essentially pushing and popping these expectations on a stack. Our results provide evidence that models can successfully suppress and recover expectations in many cases, but do not fully recover their previous grammatical state.

show abstract

“…Although LSTMs and GRUs have already been applied to account for human language performance measures (Futrell et al, 2019;Goodkind & Bicknell, 2018;Gulordava, Bojanowski, Grave, Linzen, & Baroni, 2018;Hahn & Keller, 2016;McCoy, Frank, & Linzen, 2018;Sakaguchi, Duh, Post, & Durme, 2017;Van Schijndel & Linzen, 2018a, 2018b, the question remains whether they form more accurate cognitive processing models than traditional SRNs, beyond what might be expected from their stronger language modeling abilities.…”

Section: Introductionmentioning

confidence: 99%

Comparing gated and simple recurrent neural network architectures as models of human sentence processing

Aurnhammer¹,

Frank²

2018

Preprint

View full text Add to dashboard Cite

The Simple Recurrent Network (SRN) has a long tradition in cognitive models of language processing. More recently, gated recurrent networks have been proposed that often outperform the SRN on natural language processing tasks. Here, we investigate whether two types of gated networks perform better as cognitive models of sentence reading than SRNs, beyond their advantage as language models.This will reveal whether the filtering mechanism implemented in gated networks corresponds to an aspect of human sentence processing.We train a series of language models differing only in the cell types of their recurrent layers. We then compute word surprisal values for stimuli used in self-paced reading, eye-tracking, and electroencephalography experiments, and quantify the surprisal values' fit to experimental measures that indicate human sentence reading effort.While the gated networks provide better language models, they do not outperform their SRN counterpart as cognitive models when language model quality is equal across network types. Our results suggest that the different architectures are equally valid as models of human sentence processing.

show abstract

Neural language models as psycholinguistic subjects: Representations of syntactic state

Cited by 127 publications

References 42 publications

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Hierarchical Representation in Neural Language Models: Suppression and Recovery of Expectations

Comparing gated and simple recurrent neural network architectures as models of human sentence processing

Contact Info

Product

Resources

About