Curriculum Learning and Minibatch Bucketing in Neural Machine Translation

Kocmi, Tom; Bojar, Ondřej

doi:10.26615/978-954-452-049-6_050

Cited by 125 publications

(148 citation statements)

References 16 publications

Supporting

Mentioning

124

Contrasting

Order By: Relevance

“…Wang et al (2018b) define noise level and introduce a denoising curriculum. Kocmi and Bojar (2017) use linguistically-motivated features to classify examples into bins for scheduling. use reinforcement learning to learn a denoising curriculum based on noise level of examples.…”

Section: Curriculum Learning For Nmtmentioning

confidence: 99%

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Wang¹,

Caswell²,

Chelba³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Noise and domain are important aspects of data quality for neural machine translation. Existing research focus separately on domaindata selection, clean-data selection, or their static combination, leaving the dynamic interaction across them not explicitly examined. This paper introduces a "co-curricular learning" method to compose dynamic domain-data selection with dynamic clean-data selection, for transfer learning across both capabilities. We apply an EM-style optimization procedure to further refine the "co-curriculum". Experiment results and analysis with two domains demonstrate the effectiveness of the method and the properties of data scheduled by the cocurriculum.

show abstract

Section: Curriculum Learning For Nmtmentioning

confidence: 99%

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Wang¹,

Caswell²,

Chelba³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…The idea of a curriculum was popularized by Bengio et al (2009), who viewed it as a way to improve convergence by presenting heuristicallyidentified easy examples first. Two recent papers (Kocmi and Bojar, 2017;Zhang et al, 2018) explore similar ideas for NMT, and verify that this strategy can reduce training time and improve quality.…”

Section: Related Workmentioning

confidence: 94%

Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

Kumar¹,

Foster²,

Cherry³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT). Specifically, given a training dataset with a sentence-level feature such as noise, we seek an optimal curriculum, or order for presenting examples to the system during training. Our curriculum framework allows examples to appear an arbitrary number of times, and thus generalizes data weighting, filtering, and fine-tuning schemes. Rather than relying on prior knowledge to design a curriculum, we use reinforcement learning to learn one automatically, jointly with the NMT system, in the course of a single training run. We show that this approach can beat uniform and filtering baselines on Paracrawl and WMT English-to-French datasets by up to +3.4 BLEU, and match the performance of a hand-designed, state-of-theart curriculum.

show abstract

“…4) The performances of Kocmi and Bojar (2017) and Zhang et al (2017) decreased significantly after reaching the highest BLEU. This is consistent with the hypothesis that NMT may forget the learned knowledge by directly removing corresponding sentences.…”

Section: Training Efficiencymentioning

confidence: 95%

“…Beside the PBSMT (Koehn et al, 2007) and vanilla NMT, three typical existing approaches described in the introduction were empirically compared: 1) Curriculum learning using the source sentence length as the criterion (Kocmi and Bojar, 2017). 2) Gradual fine-tuning using language model-based cross-entropy (Wees et al, 2017) 5 .…”

Section: Baselines and Settingsmentioning

confidence: 99%

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Wang

Utiyama

Sumita

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training. In this approach, a weight is assigned to each sentence based on the measured difference between the training costs of two iterations. Further, in each epoch, a certain percentage of sentences are dynamically sampled according to their weights. Empirical results based on the NIST Chinese-to-English and the WMT English-to-German tasks show that the proposed method can significantly accelerate the NMT training and improve the NMT performance.

show abstract

Curriculum Learning and Minibatch Bucketing in Neural Machine Translation

Cited by 125 publications

References 16 publications

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Contact Info

Product

Resources

About