Exploring the Syntactic Abilities of RNNs with Multi-task Learning

Enguehard, Émile; Goldberg, Yoav; Linzen, Tal

doi:10.18653/v1/k17-1003

Cited by 25 publications

(29 citation statements)

References 30 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this setup the target language impact is less visible and gender accuracy at the LSTM state level is overall much higher than that of the mono-target systems (0.77 vs 0.68 on average) whereas BLEU scores are slightly lower (−0.9% on average). While this is only an initial exploration of multilingual NMT systems, our results suggest that this kind of multi-task objective pushes the model to learn linguistic features in a more consistent way (Bjerva, 2017;Enguehard et al, 2017).…”

Section: Source-target Language Relatednessmentioning

confidence: 85%

The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation

Bisazza

Tump

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Neural sequence-to-sequence models have proven very effective for machine translation, but at the expense of model interpretability. To shed more light into the role played by linguistic structure in the process of neural machine translation, we perform a fine-grained analysis of how various source-side morphological features are captured at different levels of the NMT encoder while varying the target language. Differently from previous work, we find no correlation between the accuracy of source morphology encoding and translation quality. We do find that morphological features are only captured in context and only to the extent that they are directly transferable to the target words.

show abstract

Section: Source-target Language Relatednessmentioning

confidence: 85%

The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation

Bisazza

Tump

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…One of the first techniques to examine a neural network involves the analysis of activation patterns of the hidden layers (Elman, 1991;Giles et al, 1992). Nowadays, given its popularity, recurrent neural networks are the most evaluated networks, mainly investigated on the structures and linguistic properties they are encoding (Linzen et al, 2016;Enguehard et al, 2017;Kuncoro et al, 2017;Gulordava et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato¹,

Tiedemann²

2018

Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

247

204

View full text Add to dashboard Cite

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that attention heads learn simple positional patterns which are often redundant. In this paper, we propose to replace all but one attention head of each encoder layer with fixed -non-learnable -attentive patterns that are solely based on position and do not require any external knowledge. Our experiments show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

show abstract

“…In contrast to these approaches, the DSA-LSTM only models the probability of surface strings, albeit with an auxiliary loss that distills the next-word predictive distribution of a syntactic language model. Earlier work has also explored multi-task learning with syntactic objectives as an auxiliary loss in language modelling and machine translation (Luong et al, 2016;Eriguchi et al, 2016;Nadejde et al, 2017;Enguehard et al, 2017;Aharoni and Goldberg, 2017;Eriguchi et al, 2017). Our approach of injecting syntactic bias through a KD objective is orthogonal to this approach, with the primary difference that here the student DSA-LSTM has no direct access to syntactic annotations; it does, however, have access to the teacher RNNG's softmax distribution over the next word.…”

Section: Related Workmentioning

confidence: 99%

Scalable Syntax-Aware Language Models Using Knowledge Distillation

Kuncoro

Dyer

Rimell

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of training data. To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from. On targeted syntactic evaluations, we find that, while sequential LSTMs perform much better than previously reported, our proposed technique substantially improves on this baseline, yielding a new state of the art. Our findings and analysis affirm the importance of structural biases, even in models that learn from large amounts of data.

show abstract

Exploring the Syntactic Abilities of RNNs with Multi-task Learning

Cited by 25 publications

References 30 publications

The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation

The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Scalable Syntax-Aware Language Models Using Knowledge Distillation

Contact Info

Product

Resources

About