Understanding Back-Translation at Scale

Edunov, Sergey; Ott, Myle; Auli, Michael; Grangier, David

doi:10.18653/v1/d18-1045

Cited by 775 publications

(669 citation statements)

References 37 publications

Supporting

Mentioning

646

Contrasting

Unclassified

Order By: Relevance

“…Back-translation, either generating translations using beam search (i.e., SEARCH) or using sampling (i.e., SAMPLE), does lead to significant improvements over using only the authentic bilingual corpus (i.e., NONE). We find that SAM-PLE is more effective than SEARCH, which confirms the finding of Edunov et al (2018). Using uncertainty-based confidence (i.e., "U") signifi-cantly improves over both SEARCH and SAMPLE on the combination of all test sets (p < 0.01).…”

Section: Resultssupporting

confidence: 81%

See 1 more Smart Citation

Improving Back-Translation with Uncertainty-based Confidence Estimation

Wang¹,

Liu²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

While back-translation is simple and effective in exploiting abundant monolingual corpora to improve low-resource neural machine translation (NMT), the synthetic bilingual corpora generated by NMT models trained on limited authentic bilingual data are inevitably noisy. In this work, we propose to quantify the confidence of NMT model predictions based on model uncertainty. With word-and sentence-level confidence measures based on uncertainty, it is possible for back-translation to better cope with noise in synthetic bilingual corpora. Experiments on Chinese-English and English-German translation tasks show that uncertainty-based confidence estimation significantly improves the performance of backtranslation. 1

show abstract

Section: Resultssupporting

confidence: 81%

“…SEARCH: the translations of the monolingual corpus are generated by beam search (Sennrich et al, 2016a). SAMPLE: the translations of the monolingual corpus are generated by sampling (Edunov et al, 2018). "CE": confidence estimation method.…”

Section: Setupmentioning

confidence: 99%

Improving Back-Translation with Uncertainty-based Confidence Estimation

Wang¹,

Liu²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…According to the evaluation results reported in (Arase and Tsujii, 2017), the precision and recall of alignments are 83.6% and 78.9%, which are 89% and 92% of those of humans, respec- tively. Although alignment errors occur, previous studies show that neural networks are relatively robust against noise in a training corpus and still benefit from extra supervisions as demonstrated in (Edunov et al, 2018;Prabhumoye et al, 2018). We collect all the spans of phrases in a sentential paraphrase pair and their alignments as pairs of phrase spans.…”

Section: Phrase Alignment For Paraphrasesmentioning

confidence: 99%

Transfer Fine-Tuning: A BERT Case Study

Arase¹,

Tsujii²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

A semantic equivalence assessment is defined as a task that assesses semantic equivalence in a sentence pair by binary judgment (i.e., paraphrase identification) or grading (i.e., semantic textual similarity measurement). It constitutes a set of tasks crucial for research on natural language understanding. Recently, BERT realized a breakthrough in sentence representation learning (Devlin et al., 2019), which is broadly transferable to various NLP tasks. While BERT's performance improves by increasing its model size, the required computational power is an obstacle preventing practical applications from adopting the technology. Herein, we propose to inject phrasal paraphrase relations into BERT in order to generate suitable representations for semantic equivalence assessment instead of increasing the model size. Experiments on standard natural language understanding tasks confirm that our method effectively improves a smaller BERT model while maintaining the model size. The generated model exhibits superior performance compared to a larger BERT model on semantic equivalence assessment tasks. Furthermore, it achieves larger performance gains on tasks with limited training datasets for fine-tuning, which is a property desirable for transfer learning.

show abstract

“…For the purpose of the task, we extended the Marian toolkit with fp16 training, BERT-models (Devlin et al, 2018) and multi-task training. Similar to Edunov et al (2018) we use mixed-precision training with fp16, an optimizer delay of 16 before updating the gradients. We train on 8 Voltas with 16GB each.…”

Section: Model and Trainingmentioning

confidence: 99%

“…We mostly reproduce the results from Edunov et al (2018) and back-translate the entire German News-Crawl data with noisy back-translation. Similar to Edunov et al (2018)'s best method, we use output sampling as the noising approach. This has been implemented in Marian with the Gumbel softmax trick.…”

Section: Noisy Back-translationmentioning

confidence: 99%

Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

Junczys-Dowmunt¹

2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

View full text Add to dashboard Cite

This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong sentence-level baselines, trained on large-scale data created via data-filtering and noisy back-translation and find that backtranslation seems to mainly help with translationese input. We explore fine-tuning techniques, deeper models and different ensembling strategies to counter these effects. Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models. We experiment with data augmentation techniques for the smaller authentic data with documentboundaries and for larger authentic data without boundaries. We further explore multi-task training for the incorporation of documentlevel source language monolingual data via the BERT-objective on the encoder and twopass decoding for combinations of sentencelevel and document-level systems. Based on preliminary human evaluation results, evaluators strongly prefer the document-level systems over our comparable sentence-level system. The document-level systems also seem to score higher than the human references in source-based direct assessment.

show abstract

Understanding Back-Translation at Scale

Cited by 775 publications

References 37 publications

Improving Back-Translation with Uncertainty-based Confidence Estimation

Improving Back-Translation with Uncertainty-based Confidence Estimation

Transfer Fine-Tuning: A BERT Case Study

Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

Contact Info

Product

Resources

About