Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1108
|View full text |Cite
|
Sign up to set email alerts
|

Colorless Green Recurrent Networks Dream Hierarchically

Abstract: Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

17
576
3

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 405 publications
(601 citation statements)
references
References 35 publications
17
576
3
Order By: Relevance
“…Despite being affected by absolute distances between syntactically dependent tokens (Linzen et al, 2016), LSTMs tend to learn to a certain extent structural information even without being instructed to do so explicitly (Gulordava et al, 2018). Futrell and Levy (2018) discuss similar linguistic phenomena to what we discuss in §4.2, and show that LSTM encoder-decoder systems handle them better than previous N-gram based systems, despite being profoundly affected by distance.…”
Section: Long-distance Dependencies In Mtmentioning
confidence: 69%
“…Despite being affected by absolute distances between syntactically dependent tokens (Linzen et al, 2016), LSTMs tend to learn to a certain extent structural information even without being instructed to do so explicitly (Gulordava et al, 2018). Futrell and Levy (2018) discuss similar linguistic phenomena to what we discuss in §4.2, and show that LSTM encoder-decoder systems handle them better than previous N-gram based systems, despite being profoundly affected by distance.…”
Section: Long-distance Dependencies In Mtmentioning
confidence: 69%
“…Some previous work does use targeted tests to examine specific capacities of LMs-often inspired by psycholinguistic methods. However, the majority of this work has focused on syntactic capabilities of LMs (Linzen et al, 2016;Gulordava et al, 2018;Marvin and Linzen, 2018;Wilcox et al, 2018;Futrell et al, 2019). Relevant to our case study here, using several of these tests Goldberg (2019) shows the BERT model to perform impressively on such syntactic diagnostics.…”
Section: Introductionmentioning
confidence: 69%
“…In this particular study, we consider the LSTM language model that was made available by Gulordava et al (2018). This language model (LM) is a 2-layer LSTM with 650 hidden units in both layers, trained on a corpus with Wikipedia data.…”
Section: Generalised Contextual Decompositionmentioning
confidence: 99%