Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.475
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Data Selection and Weighting for Iterative Back-Translation

Abstract: Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance. Selecting which monolingual data to back-translate is crucial, as we require that the resulting synthetic data are of high quality and reflect the target domain. To achieve these two goals, data selection and weighting strategies have been proposed, with a common practice being to select samples close to the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(31 citation statements)
references
References 29 publications
(23 reference statements)
0
31
0
Order By: Relevance
“…Some explores data symmetry (Freitag and Firat, 2020;Birch et al, 2008;Lin et al, 2019). Zero-shot translation in severely low resource settings exploits the massive multilinguality, cross-lingual transfer, pretraining, iterative backtranslation and freezing subnetworks (Lauscher et al, 2020;Nooralahzadeh et al, 2020;Pfeiffer et al, 2020;Baziotis et al, 2020;Chronopoulou et al, 2020;Lin et al, 2020;Thompson et al, 2018;Luong et al, 2014;Dou et al, 2020).…”
Section: Machine Polyglotism and Pretrainingmentioning
confidence: 99%
“…Some explores data symmetry (Freitag and Firat, 2020;Birch et al, 2008;Lin et al, 2019). Zero-shot translation in severely low resource settings exploits the massive multilinguality, cross-lingual transfer, pretraining, iterative backtranslation and freezing subnetworks (Lauscher et al, 2020;Nooralahzadeh et al, 2020;Pfeiffer et al, 2020;Baziotis et al, 2020;Chronopoulou et al, 2020;Lin et al, 2020;Thompson et al, 2018;Luong et al, 2014;Dou et al, 2020).…”
Section: Machine Polyglotism and Pretrainingmentioning
confidence: 99%
“…Some explores data symmetry (Freitag and Birch et al, 2008;Lin et al, 2019). Zero-shot translation in severely low resource settings exploits the massive multilinguality, cross-lingual transfer, pretraining, iterative backtranslation and freezing subnetworks (Lauscher et al, 2020;Nooralahzadeh et al, 2020;Pfeiffer et al, 2020;Baziotis et al, 2020;Chronopoulou et al, 2020;Lin et al, 2020;Thompson et al, 2018;Luong et al, 2014;Dou et al, 2020).…”
Section: Machine Polyglotism and Pretrainingmentioning
confidence: 99%
“…However, there are two issues in the train/dev/test splits used in . First, Ma et al (2019) and Dou et al (2020) find that some same sentence pairs exist between the training and test data. Second, randomly shuffle the bi-text data and split it into halves, which may bring more overlap than in natural monolingual data, i.e., bilingual sentences from a document are probably selected into monolingual data (e.g., one sentence on the source split and its translation on the target split).…”
Section: Setupmentioning
confidence: 99%