Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1592
|View full text |Cite
|
Sign up to set email alerts
|

Quantity doesn’t buy quality syntax with neural language models

Abstract: Recurrent neural networks can learn to predict upcoming words remarkably well on average; in syntactically complex contexts, however, they often assign unexpectedly high probabilities to ungrammatical words. We investigate to what extent these shortcomings can be mitigated by increasing the size of the network and the corpus on which it is trained. We find that gains from increasing network size are minimal beyond a certain point. Likewise, expanding the training corpus yields diminishing returns; we estimate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
53
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(57 citation statements)
references
References 22 publications
(32 reference statements)
4
53
0
Order By: Relevance
“…Our results address the three questions posed above: First, for the range of model architectures and dataset sizes tested, we find a substantial dissociation between perplexity and SG score. Second, we find a larger effect of model inductive bias than training data size on SG score, a result that accords with van Schijndel et al (2019). Models afforded explicit structural supervision during training outperform other models: One structurally supervised model is able to achieve the same SG scores as a purely sequence-based model trained on ∼100 times the number of tokens.…”
Section: Introductionsupporting
confidence: 76%
“…Our results address the three questions posed above: First, for the range of model architectures and dataset sizes tested, we find a substantial dissociation between perplexity and SG score. Second, we find a larger effect of model inductive bias than training data size on SG score, a result that accords with van Schijndel et al (2019). Models afforded explicit structural supervision during training outperform other models: One structurally supervised model is able to achieve the same SG scores as a purely sequence-based model trained on ∼100 times the number of tokens.…”
Section: Introductionsupporting
confidence: 76%
“…These high performance levels typically come at the cost of decreased interpretability. Such neural nets are notoriously prone to learning irrelevant correlations (Ettinger, 2020; Futrell et al, 2019; Kuncoro et al, 2018; van Schijndel, Mueller, & Linzen, 2019). To avoid this problem and focus our investigation more squarely on structural constraints like locality in Grodner and Gibson (2005) and non‐structural factors such as animacy in Traxler et al (2002), we instead proceed with an explicit grammar whose generalization ability rests upon well‐chosen syntactic analyses.…”
Section: From Grammar To Processing Difficulty Predictionsmentioning
confidence: 99%
“…The Transformer allows the attention for a token to be spread over the entire input sequence, multiple times, intuitively capturing different properties. This characteristic has led to a line of research focusing on the interpretation of Transformer-based networks and their attention mechanisms (Raganato and Tiedemann, 2018;Tang et al, 2018;Mareček and Rosa, 2019;Voita et al, 2019a;Vig and Belinkov, 2019;Clark et al, 2019;Kovaleva et al, 2019;Tenney et al, 2019;Lin et al, 2019;Jawahar et al, 2019;van Schijndel et al, 2019;Hao et al, 2019b;Rogers et al, 2020).…”
Section: Related Workmentioning
confidence: 99%