Franck Burlot scite author profile

Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through back-translation -a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.

show abstract

Evaluating the morphological competence of Machine Translation Systems

Burlot¹,

Yvon²

2017

View full text Add to dashboard Cite

While recent changes in Machine Translation state-of-the-art brought translation quality a step further, it is regularly acknowledged that the standard automatic metrics do not provide enough insights to fully measure the impact of neural models. This paper proposes a new type of evaluation focused specifically on the morphological competence of a system with respect to various grammatical phenomena. Our approach uses automatically generated pairs of source sentences, where each pair tests one morphological contrast. This methodology is used to compare several systems submitted at WMT'17 for English into Czech and Latvian.

show abstract

The QT21/HimL Combined Machine Translation System

Peter¹,

Alkhouli²,

Ney³

et al. 2016

View full text Add to dashboard Cite

This paper describes the joint submission of the QT21 and HimL projects for the English→Romanian translation task of the ACL 2016 First Conference on Machine Translation (WMT 2016). The submission is a system combination which combines twelve different statistical machine translation systems provided by the different groups (RWTH Aachen University, LMU Munich, Charles University in Prague, University of Edinburgh, University of Sheffield, Karlsruhe Institute of Technology, LIMSI, University of Amsterdam, Tilde). The systems are combined using RWTH's system combination approach. The final submission shows an improvement of 1.0 BLEU compared to the best single system on newstest2016.

show abstract

The QT21 Combined Machine Translation System for English to Latvian

Peter¹,

Ney²,

Bojar³

et al. 2017

View full text Add to dashboard Cite

This paper describes the joint submission of the QT21 projects for the English→Latvian translation task of the EMNLP 2017 Second Conference on Machine Translation (WMT 2017). The submission is a system combination which combines seven different statistical machine translation systems provided by the different groups.The systems are combined using either RWTH's system combination approach, or USFD's consensus-based systemselection approach. The final submission shows an improvement of 0.5 BLEU compared to the best single system on newstest2017.

show abstract

Learning Morphological Normalization for Translation from and into Morphologically Rich Languages

Burlot¹,

Yvon²

2017

View full text Add to dashboard Cite

When translating between a morphologically rich language (MRL) and English, word forms in the MRL often encode grammatical information that is irrelevant with respect to English, leading to data sparsity issues. This problem can be mitigated by removing from the MRL irrelevant information through normalization. Such preprocessing is usually performed in a deterministic fashion, using hand-crafted rules and yielding suboptimal representations. We introduce here a simple way to automatically compute an appropriate normalization of the MRL and show that it can improve machine translation in both directions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Franck Burlot

Using Monolingual Data in Neural Machine Translation: a Systematic Study

Evaluating the morphological competence of Machine Translation Systems

The QT21/HimL Combined Machine Translation System

The QT21 Combined Machine Translation System for English to Latvian

Learning Morphological Normalization for Translation from and into Morphologically Rich Languages

Contact Info

Product

Resources

About