Colorless Green Recurrent Networks Dream Hierarchically

Gulordava, Kristina; Bojanowski, Piotr; Grave, Édouard; Linzen, Tal; Baroni, Marco

doi:10.18653/v1/n18-1108

Cited by 405 publications

(601 citation statements)

References 35 publications

Supporting

Mentioning

576

Contrasting

Order By: Relevance

“…Despite being affected by absolute distances between syntactically dependent tokens (Linzen et al, 2016), LSTMs tend to learn to a certain extent structural information even without being instructed to do so explicitly (Gulordava et al, 2018). Futrell and Levy (2018) discuss similar linguistic phenomena to what we discuss in §4.2, and show that LSTM encoder-decoder systems handle them better than previous N-gram based systems, despite being profoundly affected by distance.…”

Section: Long-distance Dependencies In Mtmentioning

confidence: 69%

Automatically Extracting Challenge Sets for Non-Local Phenomena in Neural Machine Translation

Choshen¹,

Abend²

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

We show that the state-of-the-art Transformer MT model is not biased towards monotonic reordering (unlike previous recurrent neural network models), but that nevertheless, longdistance dependencies remain a challenge for the model. Since most dependencies are shortdistance, common evaluation metrics will be little influenced by how well systems perform on them. We therefore propose an automatic approach for extracting challenge sets replete with long-distance dependencies, and argue that evaluation using this methodology provides a complementary perspective on system performance. To support our claim, we compile challenge sets for English-German and German-English, which are much larger than any previously released challenge set for MT. The extracted sets are large enough to allow reliable automatic evaluation, which makes the proposed approach a scalable and practical solution for evaluating MT performance on the long-tail of syntactic phenomena. 1

show abstract

Section: Long-distance Dependencies In Mtmentioning

confidence: 69%

Automatically Extracting Challenge Sets for Non-Local Phenomena in Neural Machine Translation

Choshen¹,

Abend²

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

show abstract

“…Some previous work does use targeted tests to examine specific capacities of LMs-often inspired by psycholinguistic methods. However, the majority of this work has focused on syntactic capabilities of LMs (Linzen et al, 2016;Gulordava et al, 2018;Marvin and Linzen, 2018;Wilcox et al, 2018;Futrell et al, 2019). Relevant to our case study here, using several of these tests Goldberg (2019) shows the BERT model to perform impressively on such syntactic diagnostics.…”

Section: Introductionmentioning

confidence: 69%

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Ettinger

2020

Transactions of the Association for Computational Linguistics

389

341

View full text Add to dashboard Cite

Pre-training by language modeling has become a popular and successful approach to NLP tasks, but we have yet to understand exactly what linguistic capacities these pretraining processes confer upon models. In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can generally distinguish good from bad completions involving shared category or role reversal, albeit with less sensitivity than humans, and it robustly retrieves noun hypernyms, but it struggles with challenging inference and role-based event predictionand in particular, it shows clear insensitivity to the contextual impacts of negation.

show abstract

“…In this particular study, we consider the LSTM language model that was made available by Gulordava et al (2018). This language model (LM) is a 2-layer LSTM with 650 hidden units in both layers, trained on a corpus with Wikipedia data.…”

Section: Generalised Contextual Decompositionmentioning

confidence: 99%

Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment

Jumelet¹,

Zuidema²,

Hupkes³

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

Extensive research has recently shown that recurrent neural language models are able to process a wide range of grammatical phenomena. How these models are able to perform these remarkable feats so well, however, is still an open question. To gain more insight into what information LSTMs base their decisions on, we propose a generalisation of Contextual Decomposition (GCD). In particular, this setup enables us to accurately distil which part of a prediction stems from semantic heuristics, which part truly emanates from syntactic cues and which part arise from the model biases themselves instead. We investigate this technique on tasks pertaining to syntactic agreement and co-reference resolution and discover that the model strongly relies on a default reasoning effect to perform these tasks.

show abstract

Colorless Green Recurrent Networks Dream Hierarchically

Cited by 405 publications

References 35 publications

Automatically Extracting Challenge Sets for Non-Local Phenomena in Neural Machine Translation

Automatically Extracting Challenge Sets for Non-Local Phenomena in Neural Machine Translation

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment

Contact Info

Product

Resources

About