Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data

Pinnis, Mārcis; Krišlauks, Rihards; Deksne, Daiga; Miks, Toms

doi:10.1007/978-3-319-64206-2_27

Cited by 31 publications

(34 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly to the method by Pinnis et al (2017b) that allows training NMT models that are more robust to unknown and rarely occurring words, we supplemented the parallel training data with synthetic parallel training sentences. To create the synthetic corpus, we performed word alignment on the parallel corpus using fast-align (Dyer et al, 2013).…”

Section: Synthetic Datamentioning

confidence: 99%

“…Then, the pre-processed sentence is translated with the NMT system. Our NMT models have been trained to leave the unknown word place-holders untranslated, i.e., to pass them through to the target side (Pinnis et al, 2017b). The capability of the NMT system to pass the place-holders through unchanged is vital for the further steps to work.…”

Section: Nmt Only Transl (For Comparison)mentioning

confidence: 99%

“…Parallel to that, a number of novelties in neural network architectures have been introduced for other sequence processing tasks, some of which, like the multiplicative LSTM (MLSTM) units (Krause et al, 2016), promise advantages even over deep recurrent network architectures. For data pre-processing, we have shown that the language agnostic word splitting method using byte pair encoding (BPE) inconsistently splits words for morphologically rich languages and that the method can be improved by linguistically motivating word splitting (Pinnis et al, 2017b).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Tilde's Machine Translation Systems for WMT 2017

Pinnis¹,

Krišlauks²,

Miks³

et al. 2017

Proceedings of the Second Conference on Machine Translation

Self Cite

View full text Add to dashboard Cite

The paper describes Tilde's EnglishLatvian and Latvian-English machine translation systems for the WMT 2017 shared task in news translation. Both constrained and unconstrained systems are described. Our constrained systems were ranked as the best performing systems according to the automatic evaluation results. The paper gives details to how we pre-processed training data, the NMT system architecture that we used for training the NMT models, the SMT systems and their usage in NMT-SMT hybrid system configurations.

show abstract

Section: Synthetic Datamentioning

confidence: 99%

Section: Nmt Only Transl (For Comparison)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tilde's Machine Translation Systems for WMT 2017

Pinnis¹,

Krišlauks²,

Miks³

et al. 2017

Proceedings of the Second Conference on Machine Translation

Self Cite

View full text Add to dashboard Cite

show abstract

“…A problem with back-translation is that model predictions are inevitably erroneous. Translation errors can be propagated to subsequent steps and impair the performance of back-translation, especially whenD b is much larger than D b (Pinnis et al, 2017;Fadaee and Monz, 2018;Poncelas et al, 2018). Therefore, it is crucial to develop principled solutions to enable back-translation to better deal with the error propagation problem.…”

Section: Introductionmentioning

confidence: 99%

Improving Back-Translation with Uncertainty-based Confidence Estimation

Wang¹,

Liu²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

While back-translation is simple and effective in exploiting abundant monolingual corpora to improve low-resource neural machine translation (NMT), the synthetic bilingual corpora generated by NMT models trained on limited authentic bilingual data are inevitably noisy. In this work, we propose to quantify the confidence of NMT model predictions based on model uncertainty. With word-and sentence-level confidence measures based on uncertainty, it is possible for back-translation to better cope with noise in synthetic bilingual corpora. Experiments on Chinese-English and English-German translation tasks show that uncertainty-based confidence estimation significantly improves the performance of backtranslation. 1

show abstract

“…Nevertheless, more is not always better, as reported by Pinnis et al (2017), where they stated that using some amount of back-translated data gives an improvement, but using double the amount gives lower results, while still better than not using any at all.…”

Section: Filtered Synthetic Training Datamentioning

confidence: 92%

C-3MA: Tartu-Riga-Zurich Translation Systems for WMT17

Rikters¹,

Amrhein²,

Del³

et al. 2017

Proceedings of the Second Conference on Machine Translation

View full text Add to dashboard Cite

This paper describes the neural machine translation systems of the University of Latvia, University of Zurich and University of Tartu. We participated in the WMT 2017 shared task on news translation by building systems for two language pairs: English↔German and English↔Latvian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation. We experimented with backtranslating the monolingual news corpora and filtering out the best translations as additional training data, enforcing named entity translation from a dictionary of parallel named entities, penalizing over-and under-translated sentences, and combining output from multiple NMT systems with SMT. The described methods give 0.7 -1.8 BLEU point improvements over our baseline systems.

show abstract

Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data

Cited by 31 publications

References 14 publications

Tilde's Machine Translation Systems for WMT 2017

Tilde's Machine Translation Systems for WMT 2017

Improving Back-Translation with Uncertainty-based Confidence Estimation

C-3MA: Tartu-Riga-Zurich Translation Systems for WMT17

Contact Info

Product

Resources

About