Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.324
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Abstract: Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. However, it can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. That is, research on multilingual UNMT has been limited. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder, making use of multilingual data to improve UNMT for al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 29 publications
(24 citation statements)
references
References 26 publications
0
22
0
Order By: Relevance
“…• We do not have any parallel data among any of the language pairs, as considered in (Liu et al, 2020;Sun et al, 2020).…”
Section: Terminologymentioning
confidence: 99%
“…• We do not have any parallel data among any of the language pairs, as considered in (Liu et al, 2020;Sun et al, 2020).…”
Section: Terminologymentioning
confidence: 99%
“…While the focus was originally on single-label image classification, KD has also been extended to the multi-label setting (Liu et al, 2018b). In NLP, KD has usually been applied in supervised settings (Kim and Rush, 2016;Huang et al, 2018;Yang et al, 2020), but also in some unsupervised tasks (usually using an unsupervised teacher for a supervised student) Sun et al, 2020). Xu et al (2018) use word embeddings jointly learned with a topic model in a procedure they term distillation, but do not follow the method from Hinton et al (2015) that we employ (instead opting for joint-learning).…”
Section: Related Workmentioning
confidence: 99%
“…The traditional BT analyzed in Section 2 and illustrated in Figure 2(a) allows us to train a T → S model with the help of an S → T model, and vice versa; however, this mutually beneficial training is performed entirely within one language pair. Multilingual UNMT (MUNMT) (Sun et al, 2020) is a special case of UNMT that is capable of translating between multiple source and target languages. Although multiple language pairs are trained jointly in MUNMT, there is an obvious shortcoming for BT: translating between language pairs that do not occur together during training, i.e., lack of optimization across language pairs.…”
Section: Cross-lingual Back-translationmentioning
confidence: 99%