Nova

Ding, Chenchen; Utiyama, Masao; Sumita, Eiichiro

doi:10.1145/3276773

Cited by 20 publications

(1 citation statement)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most pairs are from previous WMT (Gu, Kk, Tr, Ro, Et, Lt, Fi, Lv, Cs, Es, Zh, De, Ru, Fr ↔ En) and IWSLT (Vi, Ja, Ko, Nl, Ar, It ↔ En) competitions. We also use FLoRes pairs , En-Ne and En-Si), En-Hi from IITB (Kunchukuttan et al, 2017), and En-My from WAT19 (Ding et al, 2018(Ding et al, , 2019. We divide the datasets into three categories-low resource (<1M sentence pairs), medium resource (>1M and <10M), and high resource (>10M).…”

Section: Experimental Settingsmentioning

confidence: 99%

Multilingual Denoising Pre-training for Neural Machine Translation

Liu

Goyal

et al. 2020

Transactions of the Association for Computational Linguistics

697

595

View full text Add to dashboard Cite

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019 ). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task- specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training. 1

show abstract

Section: Experimental Settingsmentioning

confidence: 99%