What do Neural Machine Translation Models Learn about Morphology?

Belinkov, Yonatan; Durrani, Nadir; Dalvi, Fahim; Sajjad, Hassan; Glass, James

doi:10.18653/v1/p17-1080

Cited by 238 publications

(282 citation statements)

References 31 publications

(38 reference statements)

Supporting

Mentioning

257

Contrasting

Order By: Relevance

“…Vinyals et al, 2015), and social cues available to humans (see Box 3). Despite the limitation of the training set and objective function, surprisingly, models of this kind (e.g., Devlin et al, 2018) may also implicitly learn some compositional properties of language, such as syntax, from the structure of the input (Linzen et al, 2016;Belinkov et al, 2017;Baroni, 2019;Hewitt & Manning, 2019).…”

Section: Language Model (Gpt-2)mentioning

confidence: 99%

Direct-fit to nature: an evolutionary perspective on biological (and artificial) neural networks

Hasson

Nastase

Goldstein

2019

Preprint

View full text Add to dashboard Cite

Evolution is a blind fitting process by which organisms, over generations, adapt to the niches of an ever-changing environment. Does the mammalian brain use similar brute-force fitting processes to learn how to perceive and act upon the world? Recent advances in training deep neural networks has exposed the power of optimizing millions of synaptic weights to map millions of observations along ecologically relevant objective functions. This class of models has dramatically outstripped simpler, more intuitive models, operating robustly in real-life contexts spanning perception, language, and action coordination. These models do not learn an explicit, human-interpretable representation of the underlying structure of the data; rather, they use local computations to interpolate over task-relevant manifolds in a high-dimensional parameter space. Furthermore, counterintuitively, over-parameterized models, similarly to evolutionary processes, can be simple and parsimonious as they provide a versatile, robust solution for learning a diverse set of functions. In contrast to traditional scientific models, where the ultimate goal is interpretability, over-parameterized models eschew interpretability in favor of solving real-life problems or tasks. We contend that over-parameterized blind fitting presents a radical challenge to many of the underlying assumptions and practices in computational neuroscience and cognitive psychology. At the same time, this shift in perspective informs longstanding debates and establishes unexpected links with evolution, ecological psychology, and artificial life. Simple versus multidimensional modelsAs with any scientific model, neuroscientific models are often judged based on their interpretability (i.e., providing an explicit, formulaic description of the underlying causes) and generalization (i.e., the capacity for prediction over broad, novel contexts; e.g., von Neumann, 1955 ). However, in practice, interpretability and 2 generalization are often at odds: interpretable models may have considerable explanatory appeal but poor predictive power, while high-performing predictive models may be difficult to interpret (Breiman, 2001;Shmueli, 2010;Yarkoni and Westfall, 2017). This tension is particularly striking when modeling brain and behavior. The brain itself, in orchestrating behavior, is by conventional standards a wildly over-

show abstract

Section: Language Model (Gpt-2)mentioning

confidence: 99%

Direct-fit to nature: an evolutionary perspective on biological (and artificial) neural networks

Hasson

Nastase

Goldstein

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Several authors have proposed convolutional neural networks over character sequences, as part of models of part of speech tagging (Santos and Zadrozny, 2014), named entity recognition (Ma and Hovy, 2016;Chiu and Nichols, 2015), language (Kim et al, 2015) and machine translation (Costa-jussà and Fonollosa, 2016;Belinkov et al, 2017). The latter one presents an in-depth analysis of representations learned by neural MT models.…”

Section: Related Workmentioning

confidence: 99%

Word Representation Models for Morphologically Rich Languages in Neural Machine Translation

Vylomova¹,

Cohn²,

He³

et al. 2017

Proceedings of the First Workshop on Subword and Character Level Models in NLP

View full text Add to dashboard Cite

Out-of-vocabulary words present a great challenge for Machine Translation. Recently various character-level compositional models were proposed to address this issue. In current research we incorporate two most popular neural architectures, namely LSTM and CNN, into hard-and soft-attentional models of translation for character-level representation of the source. We propose semantic and morphological intrinsic evaluation of encoder-level representations. Our analysis of the learned representations reveals that character-based LSTM seems to be better at capturing morphological aspects compared to character-based CNN. We also show that a hard-attentional model provides better character-level representations compared to standard 'soft' attention.

show abstract

“…Moreover, it is crucial for trainers to understand whether a model learns a good representation of the data as a secondary effect of the training, and to detect potential biases or origins of errors in a model [9]. To address this issue, many modelunderstanding techniques aim to visualize or analyze learned global features of a model [8,12,63,90].…”

Section: Passive Observationmentioning

confidence: 99%

Visual Interaction with Deep Learning Models through Collaborative Semantic Inference

Gehrmann

Strobelt

Kruger³

et al. 2019

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.

show abstract

What do Neural Machine Translation Models Learn about Morphology?

Cited by 238 publications

References 31 publications

Direct-fit to nature: an evolutionary perspective on biological (and artificial) neural networks

Direct-fit to nature: an evolutionary perspective on biological (and artificial) neural networks

Word Representation Models for Morphologically Rich Languages in Neural Machine Translation

Visual Interaction with Deep Learning Models through Collaborative Semantic Inference

Contact Info

Product

Resources

About