Exploiting Monolingual Data at Scale for Neural Machine Translation

Wu, Lijun; Wang, Yiren; Xia, Yingce; Qin, Tao; Lai, Jianhuang; Liu, Tie-Yan

doi:10.18653/v1/d19-1430

Cited by 39 publications

(29 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Self learning (Zhang and Zong, 2016) leverages the source-side monolingual data. Dual learning paradigms utilize monolingual data in both source and target language (He et al, 2016;Wu et al, 2019). While these approaches can effectively improve the NMT performance, they have two limitations.…”

Section: Introductionmentioning

confidence: 99%

Multi-task Learning for Multilingual Neural Machine Translation

Wang¹,

Zhai²,

Hassan³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages with large margin, achieving significantly better results than the individual bilingual models. We also demonstrate the efficacy of the proposed approach in the zero-shot setup for language pairs without bitext training data. Furthermore, we show the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks; the proposed approach outperforms massive scale models trained on single task.

show abstract

Section: Introductionmentioning

confidence: 99%

Multi-task Learning for Multilingual Neural Machine Translation

Wang¹,

Zhai²,

Hassan³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Supervised Learning on Parallel Data First, we evaluate our model's performance when trained with parallel data on standard WMT datasets. Table 2 shows that our model consistently outperforms both VNMT and DCVAE models-which 2016a; Zhang and Zong, 2016;Wu et al, 2019). We use the joint training objective described in Equation 14.…”

Section: Translation Qualitymentioning

confidence: 88%

Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

McCarthy

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

This paper proposes a simple and effective approach to address the problem of posterior collapse in conditional variational autoencoders (CVAEs). It thus improves performance of machine translation models that use noisy or monolingual data, as well as in conventional settings. Extending Transformer and conditional VAEs, our proposed latent variable model measurably prevents posterior collapse by (1) using a modified evidence lower bound (ELBO) objective which promotes mutual information between the latent variable and the target, and (2) guiding the latent variable with an auxiliary bag-of-words prediction task. As a result, the proposed model yields improved translation quality compared to existing variational NMT models on WMT Ro↔En and De↔En. With latent variables being effectively utilized, our model demonstrates improved robustness over non-latent Transformer in handling uncertainty: exploiting noisy source-side monolingual data (up to +3.2 BLEU), and training with weakly aligned web-mined parallel data (up to +4.7 BLEU).

show abstract

“…Back-translation (Sennrich, Haddow, and Birch 2016a), which generates a synthetic training corpus by translating the target-side monolingual sentences with a backward target-to-source model, is widely adopted due to its simplicity and effectiveness. (Wu et al 2019) goes beyond back-translation and leverages both source side and target side monolingual data. Dual learning (He et al 2016;Wang et al 2019) is another way to leverage monolingual data, where the source sentence is first forward translated to the target space and then back translated to the source space.…”

Section: Neural Machine Translationmentioning

confidence: 99%

Transductive Ensemble Learning for Neural Machine Translation

Wang

Xia

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

Ensemble learning, which aggregates multiple diverse models for inference, is a common practice to improve the accuracy of machine learning tasks. However, it has been observed that the conventional ensemble methods only bring marginal improvement for neural machine translation (NMT) when individual models are strong or there are a large number of individual models. In this paper, we study how to effectively aggregate multiple NMT models under the transductive setting where the source sentences of the test set are known. We propose a simple yet effective approach named transductive ensemble learning (TEL), in which we use all individual models to translate the source test set into the target language space and then finetune a strong model on the translated synthetic corpus. We conduct extensive experiments on different settings (with/without monolingual data) and different language pairs (English↔{German, Finnish}). The results show that our approach boosts strong individual models with significant improvement and benefits a lot from more individual models. Specifically, we achieve the state-of-the-art performances on the WMT2016-2018 English↔German translations.

show abstract

Exploiting Monolingual Data at Scale for Neural Machine Translation

Cited by 39 publications

References 25 publications

Multi-task Learning for Multilingual Neural Machine Translation

Multi-task Learning for Multilingual Neural Machine Translation

Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Transductive Ensemble Learning for Neural Machine Translation

Contact Info

Product

Resources

About