Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

Giulianelli, Mario; Harding, Jacqueline L.; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem

doi:10.18653/v1/w18-5426

Cited by 147 publications

(128 citation statements)

References 11 publications

Supporting

Mentioning

116

Contrasting

Unclassified

Order By: Relevance

“…Ettinger et al (2016Ettinger et al ( , 2017; Zhu et al (2018), i.a., use a task-based approach similar to ours, where tasks that require a specific subset of linguistic knowledge are used to perform qualitative evaluation. Gulordava et al (2018), Giulianelli et al (2018), Rønning et al (2018), and Jumelet and Hupkes (2018) make a focused contribution towards a particular linguistic phenomenon (agreement, ellipsis, negative polarity). Using recast NLI, Poliak et al (2018a) probe for semantic phenomena in neural machine translation encoders.…”

Section: Related Workmentioning

confidence: 99%

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Kim¹,

Patel²,

Poliak³

et al. 2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeling, CCG supertagging and natural language inference (NLI)) on the learned representations. Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably. Overall, no pretraining objective dominates across the board, and our function word probing tasks highlight several intuitive differences between pretraining objectives, e.g., that NLI helps the comprehension of negation.

show abstract

Section: Related Workmentioning

confidence: 99%

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Kim¹,

Patel²,

Poliak³

et al. 2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

show abstract

“…For example, using linear classifiers at each time point during sentence processing, information represented by various units can be decoded, and thus provide evidence about their processing function. Using such ‘Diagnostic Classifiers’ [ 38 ], Giulianelli et al [ 39 ] explored whether grammatical-number information can be decoded from transient state of a neural network. This approach proved beneficial: in particular, it revealed that the representation of grammatical number, as in (5), is mostly stored by the highest layer of the network, and that it can be robustly maintained in network activity.…”

Section: Understanding Capacity Limitation In Light Of Neural Langmentioning

confidence: 99%

What Limits Our Capacity to Process Nested Long-Range Dependencies in Sentence Comprehension?

Lakretz

Dehaene

2020

Entropy

View full text Add to dashboard Cite

Sentence comprehension requires inferring, from a sequence of words, the structure of syntactic relationships that bind these words into a semantic representation. Our limited ability to build some specific syntactic structures, such as nested center-embedded clauses (e.g., "The dog that the cat that the mouse bit chased ran away"), suggests a striking capacity limitation of sentence processing, and thus offers a window to understand how the human brain processes sentences. Here, we review the main hypotheses proposed in psycholinguistics to explain such capacity limitation. We then introduce an alternative approach, derived from our recent work on artificial neural networks optimized for language modeling, and predict that capacity limitation derives from the emergence of sparse and feature-specific syntactic units. Unlike psycholinguistic theories, our neural network-based framework provides precise capacity-limit predictions without making any a priori assumptions about the form of the grammar or parser. Finally, we discuss how our framework may clarify the mechanistic underpinning of language processing and its limitations in the human brain.

show abstract

“…The strong performance of recurrent neural networks (RNNs) in applied natural language processing tasks has motivated an array of studies that have investigated their ability to acquire natural language syntax without syntactic annotations; these studies have identified both strengths (Linzen et al, 2016;Giulianelli et al, 2018;Gulordava et al, 2018;Kuncoro et al, 2018;van Schijndel and Linzen, 2018;) and limitations (Chowdhury and Zamparelli, 2018;Marvin and Linzen, 2018;.…”

Section: Introductionmentioning

confidence: 99%

Studying the Inductive Biases of

Ravfogel¹,

Goldberg²,

Linzen³

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

How do typological properties such as word order and morphological case marking affect the ability of neural sequence models to acquire the syntax of a language? Crosslinguistic comparisons of RNNs' syntactic performance (e.g., on subject-verb agreement prediction) are complicated by the fact that any two languages differ in multiple typological properties, as well as by differences in training corpus. We propose a paradigm that addresses these issues: we create synthetic versions of English, which differ from English in one or more typological parameters, and generate corpora for those languages based on a parsed English corpus. We report a series of experiments in which RNNs were trained to predict agreement features for verbs in each of those synthetic languages. Among other findings, (1) performance was higher in subject-verb-object order (as in English) than in subject-object-verb order (as in Japanese), suggesting that RNNs have a recency bias; (2) predicting agreement with both subject and object (polypersonal agreement) improves over predicting each separately, suggesting that underlying syntactic knowledge transfers across the two tasks; and (3) overt morphological case makes agreement prediction significantly easier, regardless of word order.

show abstract

Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

Cited by 147 publications

References 11 publications

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

What Limits Our Capacity to Process Nested Long-Range Dependencies in Sentence Comprehension?

Studying the Inductive Biases of

Contact Info

Product

Resources

About