A Teacher-Student Framework for Zero-Resource Neural Machine Translation

Chen, Yun; Liu, Yang; Cheng, Yong; Li, Victor O. K.

doi:10.18653/v1/p17-1176

Cited by 75 publications

(66 citation statements)

References 20 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As we mentioned in the main text of the paper, distillation (Chen et al, 2017) and pivoting yield zero-shot consistent models. Let us understand why this is the case.…”

Section: A3 Consistency Of Distillation and Pivotingmentioning

confidence: 74%

“…We see that models trained with agreement perform comparably to Pivot, outperforming it in some cases, e.g., when the target is Russian, perhaps because it is quite 9 www.cs.cmu.edu/˜mshediva/code/ . ‡ Distillation (Chen et al, 2017). different linguistically from the English pivot.…”

Section: Training and Evaluationmentioning

confidence: 96%

“…We consider the following languages: De, En, Es, Fr. For training, we use parallel data between En and the rest of the languages (about 1M sentences per corpus), preprocessed to avoid multi-parallel sentences, as was also done by and Chen et al (2017) and described below. The dev and test sets contain 2,000 multi-parallel sentences.…”

Section: Datasetsmentioning

confidence: 99%

“…To properly evaluate systems in terms of zero-shot generalization, we preprocess Europarl and IWSLT to avoid multi-lingual parallel sentences of the form source-pivot-target, where source-target is a zero-shot direction. To do so, we follow ; Chen et al (2017) and randomly split the overlapping pivot sentences of the original source-pivot and pivot-target corpora into two parts and merge them separately with the non-overlapping parts for each pair. Along with each parallel training sentence, we save information about source and target tags, after which all the data is combined and shuffled.…”

Section: Datasetsmentioning

confidence: 99%

“…Statistics of the IWSLT17 and IWSLT17 datasets are summarized in Table 6. UNCorpus and and Europarl datasets were exactly as described by Sestorain et al (2018) and Chen et al (2017); , respectively.…”

Section: A5 Details On the Datasetsmentioning

confidence: 99%

See 4 more Smart Citations

Consistency by Agreement in Zero-Shot Neural Machine Translation

Al-Shedivat¹,

Parikh²

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization-a challenging setup that tests models on translation directions they have not been optimized for at training time.To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zeroshot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreementbased learning often results in 2-3 BLEU zeroshot improvement over strong baselines without any loss in performance on supervised translation directions. Philip Koehn. 2017. Europarl: A parallel corpus for statistical machine translation. Philipp Koehn. 2009. Statistical machine translation. Cambridge University Press. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043.

show abstract

“…As we mentioned in the main text of the paper, distillation (Chen et al, 2017) and pivoting yield zero-shot consistent models. Let us understand why this is the case.…”

Section: A3 Consistency Of Distillation and Pivotingmentioning

confidence: 74%

Section: Training and Evaluationmentioning

confidence: 96%

Section: Datasetsmentioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

Section: A5 Details On the Datasetsmentioning

confidence: 99%

See 3 more Smart Citations

Consistency by Agreement in Zero-Shot Neural Machine Translation

Al-Shedivat¹,

Parikh²

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

show abstract

Data augmentation for low‐resource languages NMT guided by constrained sampling

et al. 2021

Self Cite

View full text Add to dashboard Cite

Data augmentation (DA) is a ubiquitous approach for several text generation tasks. Intuitively, in the machine translation paradigm, especially in low‐resource languages scenario, many DA methods have appeared. The most commonly used methods are building pseudocorpus by randomly sampling, omitting, or replacing some words in the text. However, previous approaches hardly guarantee the quality of augmented data. In this study, we try to augment the corpus by introducing a constrained sampling method. Additionally, we also build the evaluation framework to select higher quality data after augmentation. Namely, we use the discriminator submodel to mitigate syntactic and semantic errors to some extent. Experimental results show that our augmentation method consistently outperforms all the previous state‐of‐the‐art methods on both small and large‐scale corpora in eight language pairs from four corpora by 2.38–4.18 bilingual evaluation understudy points.

show abstract