2020
DOI: 10.1162/tacl_a_00343
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Denoising Pre-training for Neural Machine Translation

Abstract: This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019 ). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

7
515
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 545 publications
(522 citation statements)
references
References 27 publications
7
515
0
Order By: Relevance
“…Moreover, it is less applicable to lowresource language pairs without adequate bitext data. Self-supervised pre-training approaches (Radford et al, 2018;Devlin et al, 2019;Conneau and Lample, 2019;Lewis et al, 2019;Liu et al, 2020), which train the model with denoising learning objectives on the large-scale monolingual data, have achieved remarkable performances in many NLP applications. However, catastrophic forgetting effect (Thompson et al, 2019), where finetuning on a task leads to degradation on the main task, limits the success of continuing training NMT on models pre-trained with monolingual data.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, it is less applicable to lowresource language pairs without adequate bitext data. Self-supervised pre-training approaches (Radford et al, 2018;Devlin et al, 2019;Conneau and Lample, 2019;Lewis et al, 2019;Liu et al, 2020), which train the model with denoising learning objectives on the large-scale monolingual data, have achieved remarkable performances in many NLP applications. However, catastrophic forgetting effect (Thompson et al, 2019), where finetuning on a task leads to degradation on the main task, limits the success of continuing training NMT on models pre-trained with monolingual data.…”
Section: Introductionmentioning
confidence: 99%
“…Self-supervised Learning This work is motivated by the recent success of self-supervised learning for NLP applications (Radford et al, 2018;Devlin et al, 2019;Lample et al, 2018a,b;Conneau and Lample, 2019;Lewis et al, 2019;Liu et al, 2020). Different denoising objectives have been designed to train the neural networks on large-scale unlabeled text.…”
Section: Introductionmentioning
confidence: 99%
“…For future work, we want to improve the quality of our generation models since there seems to be much room for improvement when compared to human performance. It may also be interesting to apply other pre-training methods (Yang et al 2019;Liu et al 2020) as well as to incorporate knowledge of the characters in question (Ghazvininejad et al 2018) in order to enhance the character-ness of the generated utterances. We also want to examine the relationship between the naturalness of a generated response and the degree to which the meta information can be reflected.…”
Section: Summary and Future Workmentioning
confidence: 99%
“…, pre-trained language models are showing promising results in a wide variety of natural language processing tasks(Devlin et al 2019;Radford et al 2019;Yang et al 2019;Liu et al 2020). Such models can more accurately capture the meaning of words depending on the context with a massive amount of training data, enabling them to be applied to finetuning for particular downstream tasks.…”
mentioning
confidence: 99%
“…Another recent approach, mBART (Liu et al, 2020), leverages both monolingual and parallel data and also yields improvements in translation quality for lower-resource languages such as Nepali, Sinhala and Gujarati. 3 While this provides a solution for small quantities of training data or monolingual resources, the extent to which standard BLEU evaluations reflect translation quality is not clear yet, since human evaluation studies are missing.…”
Section: Multilingualmentioning
confidence: 99%