Representation of Linguistic Form and Function in Recurrent Neural Networks

Kádár, Ákos; Chrupała, Grzegorz; Alishahi, Afra

doi:10.1162/coli_a_00300

Cited by 115 publications

(112 citation statements)

References 21 publications

Supporting

Mentioning

112

Contrasting

Order By: Relevance

“…Analysis of Neural Network Models. Our work joins a recent strand in NLP that systematically analyzes what different neural network models learn about language (Linzen et al, 2016;Kádár et al, 2017;Conneau et al, 2018;Gulordava et al, 2018b;Nematzadeh et al, 2018, a.o.). This work, like ours, has yielded both positive and negative results: There is evidence that they learn complex linguistic phenomena of morphological and syntactic nature, like long distance agreement (Gulordava et al, 2018b;Giulianelli et al, 2018), but less evidence that they learn how language relates to situations; for instance, Nematzadeh et al (2018) show that memory-augmented neural models fail on tasks that require keeping track of inconsistent states of the world.…”

Section: Related Workmentioning

confidence: 78%

What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

Aina¹,

Silberer²,

Sorodoc³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

Humans use language to refer to entities in the external world. Motivated by this, in recent years several models that incorporate a bias towards learning entity representations have been proposed. Such entity-centric models have shown empirical success, but we still know little about why.In this paper we analyze the behavior of two recently proposed entity-centric models in a referential task, Entity Linking in Multi-party Dialogue (SemEval 2018 Task 4). We show that these models outperform the state of the art on this task, and that they do better on lower frequency entities than a counterpart model that is not entity-centric, with the same model size. We argue that making models entitycentric naturally fosters good architectural decisions. However, we also show that these models do not really build entity representations and that they make poor use of linguistic context. These negative results underscore the need for model analysis, to test whether the motivations for particular architectures are borne out in how models behave when deployed.1 Note the analogy with traditional models in formal linguistics like Discourse Representation Theory (Kamp and Reyle, 2013). 2 Source code for our model, the training procedure and the new dataset is published on https://github.com/ amore-upf/analysis-entity-centric-nns.3 https://competitions.codalab.org/ competitions/17310.

show abstract

Section: Related Workmentioning

confidence: 78%

What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

Aina¹,

Silberer²,

Sorodoc³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

show abstract

“…Another method to measure relevance is by removing the input, and tracking the difference in in network's output (Li et al, 2016b). While these methods focus on explaining a model's decision, Shi et al (2016); Kádár et al (2017); Calvillo and Crocker (2018) investigate how a particular concept is represented in the network. Analyzing and interpreting the attention mechanism in NLP (Koehn and Knowles, 2017;Ghader and Monz, 2017;Tang and Nivre, 2018;Clark et al, 2019;Vig and Belinkov, 2019) is another direction that has drawn major interest.…”

Section: Related Workmentioning

confidence: 99%

Interrogating the Explanatory Power of Attention in Neural Machine Translation

Moradi¹,

Kambhatla²,

Sarkar³

2019

Proceedings of the 3rd Workshop on Neural Generation and Translation

View full text Add to dashboard Cite

Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model's decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of the trained attention model. Using these counterfactual attention mechanisms we assess the extent to which they still preserve the generation of function and content words in the translation process. Compared to a state of the art attention model, our counterfactual attention models produce 68% of function words and 21% of content words in our German-English dataset. Our experiments demonstrate that attention models by themselves cannot reliably explain the decisions made by a NMT model.

show abstract

“…This paper focuses on computational models of visually grounded speech that were introduced by [14,4]. Learned representations of such models were analyzed by [11,7,4]: [11] introduced novel methods for interpreting the activation patterns of recurrent neural networks (RNN) in a model of visually grounded meaning representation from textual and visual input and showed that RNN pay attention to word tokens belonging to specific lexical categories. [4] found that final layers tend to encode semantic information whereas lower layers tend to encode form-related information.…”

Section: Introductionmentioning

confidence: 99%

Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese

Havard

Chevrot

Besacier

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese. Experimental results show that attention focuses on nouns and this behaviour holds true for two very typologically different languages. We also draw parallels between artificial neural attention and human attention and show that neural attention focuses on word endings as it has been theorised for human attention. Finally, we investigate how two visually grounded monolingual models can be used to perform cross-lingual speech-to-speech retrieval. For both languages, the enriched bilingual (speech-image) corpora with part-of-speech tags and forced alignments are distributed to the community for reproducible research.Index Termsgrounded language learning, attention mechanism, cross-lingual speech retrieval, recurrent neural networks.

show abstract

Representation of Linguistic Form and Function in Recurrent Neural Networks

Cited by 115 publications

References 21 publications

What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue

Interrogating the Explanatory Power of Attention in Neural Machine Translation

Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese

Contact Info

Product

Resources

About