Multilingual Denoising Pre-training for Neural Machine Translation

Liu, Yinhan; Gu, Jiatao; Goyal, Naman; Li, Xian; Edunov, Sergey; Ghazvininejad, Marjan; Lewis, Mike; Zettlemoyer, Luke

doi:10.1162/tacl_a_00343

Cited by 545 publications

(522 citation statements)

References 27 publications

Supporting

Mentioning

515

Contrasting

Order By: Relevance

“…Moreover, it is less applicable to lowresource language pairs without adequate bitext data. Self-supervised pre-training approaches (Radford et al, 2018;Devlin et al, 2019;Conneau and Lample, 2019;Lewis et al, 2019;Liu et al, 2020), which train the model with denoising learning objectives on the large-scale monolingual data, have achieved remarkable performances in many NLP applications. However, catastrophic forgetting effect (Thompson et al, 2019), where finetuning on a task leads to degradation on the main task, limits the success of continuing training NMT on models pre-trained with monolingual data.…”

Section: Introductionmentioning

confidence: 99%

“…Self-supervised Learning This work is motivated by the recent success of self-supervised learning for NLP applications (Radford et al, 2018;Devlin et al, 2019;Lample et al, 2018a,b;Conneau and Lample, 2019;Lewis et al, 2019;Liu et al, 2020). Different denoising objectives have been designed to train the neural networks on large-scale unlabeled text.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-task Learning for Multilingual Neural Machine Translation

Wang¹,

Zhai²,

Hassan³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages with large margin, achieving significantly better results than the individual bilingual models. We also demonstrate the efficacy of the proposed approach in the zero-shot setup for language pairs without bitext training data. Furthermore, we show the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks; the proposed approach outperforms massive scale models trained on single task.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multi-task Learning for Multilingual Neural Machine Translation

Wang¹,

Zhai²,

Hassan³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…For future work, we want to improve the quality of our generation models since there seems to be much room for improvement when compared to human performance. It may also be interesting to apply other pre-training methods (Yang et al 2019;Liu et al 2020) as well as to incorporate knowledge of the characters in question (Ghazvininejad et al 2018) in order to enhance the character-ness of the generated utterances. We also want to examine the relationship between the naturalness of a generated response and the degree to which the meta information can be reflected.…”

Section: Summary and Future Workmentioning

confidence: 99%

“…, pre-trained language models are showing promising results in a wide variety of natural language processing tasks(Devlin et al 2019;Radford et al 2019;Yang et al 2019;Liu et al 2020). Such models can more accurately capture the meaning of words depending on the context with a massive amount of training data, enabling them to be applied to finetuning for particular downstream tasks.…”

mentioning

confidence: 99%

Collection of Meta Information with User-Generated Question Answer Pairs and its Reflection for Improving Expressibility in Response Generation

Kodama

Higashinaka

Mitsuda

et al. 2021

Journal of Natural Language Processing

View full text Add to dashboard Cite

This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data. Using the framework of role play-based question-answering, we collected single-turn question-answer pairs for particular characters from online users. Meta information was also collected such as emotion and intimacy related to question-answer pairs. We verified the quality of the collected data and, by subjective evaluation, we also verified their usefulness in training neural conversational models for generating responses reflecting the meta information, especially emotion.

show abstract

“…Another recent approach, mBART (Liu et al, 2020), leverages both monolingual and parallel data and also yields improvements in translation quality for lower-resource languages such as Nepali, Sinhala and Gujarati. 3 While this provides a solution for small quantities of training data or monolingual resources, the extent to which standard BLEU evaluations reflect translation quality is not clear yet, since human evaluation studies are missing.…”

Section: Multilingualmentioning

confidence: 99%

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Nekoto¹,

Marivate²,

Matsila³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Lowresourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. * ∀ to represent the whole Masakhane community.As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at https://github. com/masakhane-io/masakhane-mt.

show abstract

Multilingual Denoising Pre-training for Neural Machine Translation

Cited by 545 publications

References 27 publications

Multi-task Learning for Multilingual Neural Machine Translation

Multi-task Learning for Multilingual Neural Machine Translation

Collection of Meta Information with User-Generated Question Answer Pairs and its Reflection for Improving Expressibility in Response Generation

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Contact Info

Product

Resources

About