The University of Cambridge’s Machine Translation Systems for WMT18

Stahlberg, Felix; Gispert, Adrià de; Byrne, Bill

doi:10.18653/v1/w18-6427

Cited by 18 publications

(40 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Especially, for newstest2016 and newstest2018, we achieve 1.0 BLEU score improvement over the MS-Marian and set new records for these tasks. We also list the WMT18 top-2 systems for De!En translation in Table 4: RWTH (Graça et al, 2018) and UCAM (Stahlberg et al, 2018) systems, which are both ensemble models. Similarly, our single model surpasses these ensemble systems by a large margin.…”

Section: Resultsmentioning

confidence: 99%

Exploiting Monolingual Data at Scale for Neural Machine Translation

Wu¹,

Wang²,

Xia³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

While target-side monolingual data has been proven to be very useful to improve neural machine translation (briefly, NMT) through back translation, source-side monolingual data is not well investigated. In this work, we study how to use both the source-side and targetside monolingual data for NMT, and propose an effective strategy leveraging both of them. First, we generate synthetic bitext by translating monolingual data from the two domains into the other domain using the models pretrained on genuine bitext. Next, a model is trained on a noised version of the concatenated synthetic bitext where each source sequence is randomly corrupted. Finally, the model is fine-tuned on the genuine bitext and a clean version of a subset of the synthetic bitext without adding any noise. Our approach achieves state-of-the-art results on WMT16, WMT17, WMT18 English$German translations and WMT19 German!French translations, which demonstrate the effectiveness of our method. We also conduct a comprehensive study on how each part in the pipeline works.

show abstract

Section: Resultsmentioning

confidence: 99%

Exploiting Monolingual Data at Scale for Neural Machine Translation

Wu¹,

Wang²,

Xia³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…As our APE model seems agnostic to the model which produced the RTT, we applied it to the best submissions of the recent WMT18 evaluation campaign, applying to German-original half of the test set only. Table 4 shows the results for the 2 top submissions of Microsoft (Junczys-Dowmunt, 2018) and Cambridge (Stahlberg et al, 2018). Both systems improved by up to 0.8 points in BLEU.…”

Section: English→germanmentioning

confidence: 99%

APE at Scale and Its Implications on MT Evaluation Biases

Freitag¹,

Caswell²,

Roy³

2019

Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

View full text Add to dashboard Cite

In this work, we train an Automatic Post-Editing (APE) model and use it to reveal biases in standard Machine Translation (MT) evaluation procedures. The goal of our APE model is to correct typical errors introduced by the translation process, and convert the "translationese" output into natural text. Our APE model is trained entirely on monolingual data that has been round-trip translated through English, to mimic errors that are similar to the ones introduced by NMT. We apply our model to the output of existing NMT systems, and demonstrate that, while the human-judged quality improves in all cases, BLEU scores drop with forward-translated test sets. We verify these results for the WMT18 English→German, WMT15 English→French, and WMT16 English→Romanian tasks. Furthermore, we selectively apply our APE model on the output of the top submissions of the most recent WMT evaluation campaigns. We see quality improvements on all tasks of up to 2.5 BLEU points.

show abstract

“…Our LMs are Transformer (Vaswani et al, 2017) decoders (transformer big) trained using the Tensor2Tensor library (Vaswani et al, 2018). We delay SGD updates (Stahlberg et al, 2018a;Saunders et al, 2018) with factor 2 to simulate 500K training steps with 8 GPUs on 4 physical GPUs. Training batches contain about 4K source and target tokens.…”

Section: Methodsmentioning

confidence: 99%

The CUED’s Grammatical Error Correction Systems for BEA-2019

Stahlberg

Byrne

2019

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Self Cite

View full text Add to dashboard Cite

We describe two entries from the Cambridge University Engineering Department to the BEA 2019 Shared Task on grammatical error correction. Our submission to the lowresource track is based on prior work on using finite state transducers together with strong neural language models. Our system for the restricted track is a purely neural system consisting of neural language models and neural machine translation models trained with backtranslation and a combination of checkpoint averaging and fine-tuning -without the help of any additional tools like spell checkers. The latter system has been used inside a separate system combination entry in cooperation with the Cambridge University Computer Lab.

show abstract

The University of Cambridge’s Machine Translation Systems for WMT18

Cited by 18 publications

References 36 publications

Exploiting Monolingual Data at Scale for Neural Machine Translation

Exploiting Monolingual Data at Scale for Neural Machine Translation

APE at Scale and Its Implications on MT Evaluation Biases

The CUED’s Grammatical Error Correction Systems for BEA-2019

Contact Info

Product

Resources

About