2020
DOI: 10.48550/arxiv.2001.08210
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multilingual Denoising Pre-training for Neural Machine Translation

Abstract: This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective . mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
127
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 251 publications
(127 citation statements)
references
References 34 publications
0
127
0
Order By: Relevance
“…However, the success of the above DL-based methods rely heavily on large-scale datasets, posing a challenge for supervised and cross-domain text generation tasks. Since 2018, large-scale pretrained Language models (PLMs) such as BERT [Devlin et al 2018], RoBERTa , GPT , T5 [Raffel et al 2019] and mBART [Liu et al 2020a], have gradually become a new paradigm of NLP. Owing to its use of large corpus and unsupervised learning based on the Transformer structure, PLMs are believed to have learned a great deal of semantic and syntactical knowledge from the data, and only a fine-tuning is required for downstream tasks to get the state-of-the-art (SOTA) performance.…”
Section: Ai Chatbot Story Generationmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the success of the above DL-based methods rely heavily on large-scale datasets, posing a challenge for supervised and cross-domain text generation tasks. Since 2018, large-scale pretrained Language models (PLMs) such as BERT [Devlin et al 2018], RoBERTa , GPT , T5 [Raffel et al 2019] and mBART [Liu et al 2020a], have gradually become a new paradigm of NLP. Owing to its use of large corpus and unsupervised learning based on the Transformer structure, PLMs are believed to have learned a great deal of semantic and syntactical knowledge from the data, and only a fine-tuning is required for downstream tasks to get the state-of-the-art (SOTA) performance.…”
Section: Ai Chatbot Story Generationmentioning
confidence: 99%
“…Seq2seq Models: The seq2seq models use both encoder and decoder of the Transformer, for a better model flexibility. Currently, the most representative models of this type include T5 [Raffel et al 2019] and mBART [Liu et al 2020a]. In principle almost all pre-trained tasks used in AE and AR models can be adapted to the seq2seq models.…”
Section: Output: Story Paragraphmentioning
confidence: 99%
See 1 more Smart Citation
“…3 We take the idea of adding a special token id/tag to each input sentence in our mid-tuning dataset, from Gao et al (2020) and Liu et al (2020). It helps the model to differentiate between sentence sa and semantic form r).…”
Section: Learning and Aligning Encodersmentioning
confidence: 99%
“…Zhang et al (2020) suggest that the performance degradation results from limited multilingual NMT model capacity. Some research overcame such degradation by fine-tuning the whole model on the bilingual corpus (Neubig and Hu, 2018;Conneau and Lample, 2019;Liu et al, 2020). However, finetuning the whole model is parameter inefficient: it consumes a large amount of storage to archive separate models for different translation pairs.…”
Section: Introductionmentioning
confidence: 99%