Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.250
|View full text |Cite
|
Sign up to set email alerts
|

Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

Abstract: Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of languageindependent representations, or whether a multilingual model partitions its weights among different languages. While most of such work has been conducted in a "black-box" manner, this paper aims to analyze individual components of a multilingual neural translation (NMT) model. In particular, we look at the encoder self-attention and encoder-decoder attention heads (in a many-to-one N… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 36 publications
(43 reference statements)
0
2
0
Order By: Relevance
“…Additionally, Shao and Nivre (2016) demonstrated the effectiveness of convolutional neural networks (CNNs) in English-to-Chinese transliteration, surpassing traditional PBSMT methods in terms of accuracy. Moreover, Jankowski et al (2021) presented a paper focusing on multilingual NMT models tailored for Slavic languages, including Polish and Slovak. The paper introduced soft decoding, a technique that allows the NMT model to generate multiple translations simultaneously.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Additionally, Shao and Nivre (2016) demonstrated the effectiveness of convolutional neural networks (CNNs) in English-to-Chinese transliteration, surpassing traditional PBSMT methods in terms of accuracy. Moreover, Jankowski et al (2021) presented a paper focusing on multilingual NMT models tailored for Slavic languages, including Polish and Slovak. The paper introduced soft decoding, a technique that allows the NMT model to generate multiple translations simultaneously.…”
Section: Literature Reviewmentioning
confidence: 99%
“…13 CBOW results are shown in Figure9in the Appendix; there are also many identifiable clusters like body parts and clothing, but many others are less clear than clusters from the LSTM. 14 This approach is similar to the category distinction test for masked language models inKim and Smolensky (2021). 15 The Captioning LSTM always needs an image input, so we used the mean image frame of the training set in this evaluation.…”
mentioning
confidence: 99%