Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Unsupervised translation has reached impressive performance on resource-rich language pairs such as English-French and English-German. However, early studies have shown that in more realistic settings involving lowresource, rare languages, unsupervised translation performs poorly, achieving less than 3.0 BLEU. In this work, we show that multilinguality is critical to making unsupervised systems practical for low-resource settings. In particular, we present a single model for 5 lowresource languages (Gujarati, Kazakh, Nepali, Sinhala, and Turkish) to and from English directions, which leverages monolingual and auxiliary parallel data from other high-resource language pairs via a three-stage training scheme. We outperform all current state-of-the-art unsupervised baselines for these languages, achieving gains of up to 14.4 BLEU. Additionally, we outperform strong supervised baselines for various language pairs as well as match the performance of the current state-of-the-art supervised model for NeÑEn. We conduct a series of ablation studies to establish the robustness of our model under different degrees of data quality, as well as to analyze the factors which led to the superior performance of the proposed approach over traditional unsupervised models.

show abstract

“…• We do not have any parallel data among any of the language pairs, as considered in (Liu et al, 2020;Sun et al, 2020).…”

Section: Terminologymentioning

confidence: 99%

Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

García¹,

Siddhant²,

Fırat³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…While the focus was originally on single-label image classification, KD has also been extended to the multi-label setting (Liu et al, 2018b). In NLP, KD has usually been applied in supervised settings (Kim and Rush, 2016;Huang et al, 2018;Yang et al, 2020), but also in some unsupervised tasks (usually using an unsupervised teacher for a supervised student) Sun et al, 2020). Xu et al (2018) use word embeddings jointly learned with a topic model in a procedure they term distillation, but do not follow the method from Hinton et al (2015) that we employ (instead opting for joint-learning).…”

Section: Related Workmentioning

confidence: 99%

Improving Neural Topic Models using Knowledge Distillation

Alexander¹,

Goel²,

Resnik³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Topic models are often used to identify humaninterpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate architectures, obtaining state-of-the-art topic coherence. We show that our adaptable framework not only improves performance in the aggregate over all estimated topics, as is commonly reported, but also in head-to-head comparisons of aligned topics. * Equal contribution. BoW BAT art chess gingerbread modernism painter picasso θ d • B Base neural topic model d Marcel Duchamp was a painter, sculptor, chess player, and writer whose work is associated with Cubism, Dada, and conceptual art. 7 qwone.com/˜jason/20Newsgroups 8 s3.amazonaws.com/research.metamind. io/wikitext/wikitext-103-v1.zip 9 ai.stanford.edu/˜amaas/data/sentiment

show abstract

“…The traditional BT analyzed in Section 2 and illustrated in Figure 2(a) allows us to train a T → S model with the help of an S → T model, and vice versa; however, this mutually beneficial training is performed entirely within one language pair. Multilingual UNMT (MUNMT) (Sun et al, 2020) is a special case of UNMT that is capable of translating between multiple source and target languages. Although multiple language pairs are trained jointly in MUNMT, there is an obvious shortcoming for BT: translating between language pairs that do not occur together during training, i.e., lack of optimization across language pairs.…”

Section: Cross-lingual Back-translationmentioning

confidence: 99%

Reference Language based Unsupervised Neural Machine Translation

Zhao

Wang

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

Exploiting a common language as an auxiliary for better translation has a long tradition in machine translation and lets supervised learning-based machine translation enjoy the enhancement delivered by the well-used pivot language in the absence of a source language to target language parallel corpus. The rise of unsupervised neural machine translation (UNMT) almost completely relieves the parallel corpus curse, though UNMT is still subject to unsatisfactory performance due to the vagueness of the clues available for its core back-translation training. Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism. Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language, demonstrating the usefulness of the proposed reference language-based UNMT and establishing a good start for the community.

show abstract

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Cited by 29 publications

References 26 publications

Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

Improving Neural Topic Models using Knowledge Distillation

Reference Language based Unsupervised Neural Machine Translation

Contact Info

Product

Resources

About