2021
DOI: 10.1016/j.cognition.2021.104699
|View full text |Cite
|
Sign up to set email alerts
|

Mechanisms for handling nested dependencies in neural-network language models and humans

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
27
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(28 citation statements)
references
References 52 publications
1
27
0
Order By: Relevance
“…Here, we follow Pimentel et al (2020a) and use both simple (linear) and complex (non-linear) models, as well as "complex" tasks (dependency parsing). As an alternative to parametric probes, stimulus-based non-parametric probing (Linzen et al, 2016;Jumelet and Hupkes, 2018;Marvin and Linzen, 2018;Gulordava et al, 2018a;Warstadt et al, 2019aWarstadt et al, , 2020aEttinger, 2020;Lakretz et al, 2021) has been used to show that even without a learned probe, BERT can predict syntactic properties with high confidence (Goldberg, 2019;Wolf, 2019). We use this class of non-parametric probes to investigate RoBERTa's ability to learn word order during pre-training.…”
Section: Related Workmentioning
confidence: 99%
“…Here, we follow Pimentel et al (2020a) and use both simple (linear) and complex (non-linear) models, as well as "complex" tasks (dependency parsing). As an alternative to parametric probes, stimulus-based non-parametric probing (Linzen et al, 2016;Jumelet and Hupkes, 2018;Marvin and Linzen, 2018;Gulordava et al, 2018a;Warstadt et al, 2019aWarstadt et al, , 2020aEttinger, 2020;Lakretz et al, 2021) has been used to show that even without a learned probe, BERT can predict syntactic properties with high confidence (Goldberg, 2019;Wolf, 2019). We use this class of non-parametric probes to investigate RoBERTa's ability to learn word order during pre-training.…”
Section: Related Workmentioning
confidence: 99%
“…Many recent studies have treated neural LMs and contextualized word prediction models-primarily LSTM LMs (Sundermeyer et al, 2012), GPT-2 (Radford et al, 2019), and BERT (Devlin et al, 2019)-as psycholinguistic subjects to be studied behaviorally (Linzen et al, 2016;Gulordava et al, 2018;Goldberg, 2019). Some have studied whether models prefer grammatical completions in subjectverb agreement contexts (Marvin and Linzen, 2018;van Schijndel et al, 2019;Goldberg, 2019;Mueller et al, 2020;Lakretz et al, 2021;, as well as in filler-gap dependencies (Wilcox et al, 2018. These are based on the approach of Linzen et al (2016), where a model's ability to syntactically generalize is measured by its ability to choose the correct inflection in difficult structural contexts instantiated by tokens that the model has not seen together during training.…”
Section: Targeted Syntactic Evaluationmentioning
confidence: 99%
“…To reduce the cost of training a classifier, Zhou and Srikumar (2021) indirectly predict the performance of probing classifiers by analyzing how the labeled data is represented in the vector space. Some studies identify neurons which make a huge contribution to solving the desired task, by looking at the performance of the task when the activation of neurons is forcibly controlled (Bau et al, 2018;Lakretz et al, 2021;Cao et al, 2021).…”
Section: Related Workmentioning
confidence: 99%