Can LSTM Learn to Capture Agreement? The Case of Basque

Ravfogel, Shauli; Goldberg, Yoav; Tyers, Francis M.

doi:10.18653/v1/w18-5412

Cited by 37 publications

(34 citation statements)

References 16 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our main observation here is that training, development and test splits and random subsamples of one sample of data. Using random subsamples this way is common in machine learning, including bias detection studies (Elazar and Goldberg, 2018;Zhao et al, 2019) and probing studies (Ravfogel et al, 2018;Lin et al, 2019), but is known to overestimate performance (Globerson and Roweis, 2016), in particular for highdimensional problems.…”

Section: Adversarial Attribute Removal With Diagnostic Classifiersmentioning

confidence: 99%

Adversarial Removal of Demographic Attributes Revisited

Barrett¹,

Kementchedjhieva²,

Elazar³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a heldout subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

show abstract

Section: Adversarial Attribute Removal With Diagnostic Classifiersmentioning

confidence: 99%

Adversarial Removal of Demographic Attributes Revisited

Barrett¹,

Kementchedjhieva²,

Elazar³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…This approach often leverages experimental paradigms from psycholinguistics that were originally developed to characterize the representations used by humans. Ravfogel et al (2018) trained RNNs to predict the agreement features of a verb in Basque; perfect accuracy on this task requires identifying the subject of the verb, which in turn requires sophisticated syntactic representations (Linzen et al 2016). In Basque, which differs from English in a large number of properties, accuracy was substantially lower than in earlier studies on English.…”

Section: What Do Neural Network Learn About Language?mentioning

confidence: 96%

Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop

2019

View full text Add to dashboard Cite

The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category.

show abstract

“…However, Kuncoro et al (2018) have also shown that although sequential LSTMs can learn syntactic information, a recursive neural network that explicitly models hierarchy (the Recurrent Neural Network Grammar model from Dyer et al [2015]) is better at this: It performs better on the number agreement task from Linzen, Dupoux, and Goldberg (2016). In addition, Ravfogel, Goldberg, and Tyers (2018) and Ravfogel, Goldberg, and Linzen (2019) have cast some doubts on the results by Linzen, Dupoux, and Goldberg (2016) and Gulordava et al (2018) by looking at Basque and synthetic languages with different word orders, respectively, in the two studies.…”

Section: Recursive Vs Recurrent Neural Networkmentioning

confidence: 99%

What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?

Lhoneux

Stymne

Nivre

2021

Computational Linguistics

View full text Add to dashboard Cite

There is a growing interest in investigating what neural NLP models learn about language. A prominent open question is the question of whether or not it is necessary to model hierarchical structure. We present a linguistic investigation of a neural parser adding insights to this question. We look at transitivity and agreement information of auxiliary verb constructions (AVCs) in comparison to finite main verbs (FMVs). This comparison is motivated by theoretical work in dependency grammar and in particular the work of Tesnière (1959) where AVCs and FMVs are both instances of a nucleus, the basic unit of syntax. An AVC is a dissociated nucleus, it consists of at least two words, and an FMV is its non-dissociated counterpart, consisting of exactly one word. We suggest that the representation of AVCs and FMVs should capture similar information. We use diagnostic classifiers to probe agreement and transitivity information in vectors learned by a transition-based neural parser in four typologically different languages. We find that the parser learns different information about AVCs and FMVs if only sequential models (BiLSTMs) are used in the architecture but similar information when a recursive layer is used. We find explanations for why this is the case by looking closely at how information is learned in the network and looking at what happens with different dependency representations of AVCs. We conclude that there may be benefits to using a recursive layer in dependency parsing and that we have not yet found the best way to integrate it in our parsers.

show abstract

Can LSTM Learn to Capture Agreement? The Case of Basque

Cited by 37 publications

References 16 publications

Adversarial Removal of Demographic Attributes Revisited

Adversarial Removal of Demographic Attributes Revisited

Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop

What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?

Contact Info

Product

Resources

About