Proceedings of the Second Conference on Machine Translation 2017
DOI: 10.18653/v1/w17-4735
|View full text |Cite
|
Sign up to set email alerts
|

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

Abstract: This paper describes the statistical machine translation system developed at RWTH Aachen University for the English→German and German→English translation tasks of the EMNLP 2017 Second Conference on Machine Translation (WMT 2017). We use ensembles of attention-based neural machine translation system for both directions. We use the provided parallel and synthetic data to train the models. In addition, we also create a phrasal system using joint translation and reordering models in decoding and neural models in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
4

Relationship

4
0

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 24 publications
(21 reference statements)
0
4
0
Order By: Relevance
“…We use a variant of attention weight / fertility feedback (Tu et al, 2016), which is inverse in our case, to use a multiplication instead of a division, for better numerical stability. Our model was derived from the model presented by Peter et al, 2017) and (Bahdanau et al, 2014).…”
Section: Performance Comparisonmentioning
confidence: 99%
“…We use a variant of attention weight / fertility feedback (Tu et al, 2016), which is inverse in our case, to use a multiplication instead of a division, for better numerical stability. Our model was derived from the model presented by Peter et al, 2017) and (Bahdanau et al, 2014).…”
Section: Performance Comparisonmentioning
confidence: 99%
“…We modified the RWTH Aachen translation system as described in (Peter et al, 2017) based on the Blocks framework (van Merriënboer et al, 2015) and Theano (Theano Development Team, 2016) to also work as a recurrent language model. The training data is chosen to be equivalent to the one used in the training of the count-based models.…”
Section: Neural Network Language Modelmentioning
confidence: 99%
“…The Transformer model was trained using the standard parallel WMT 2018 data sets (namely Europarl, CommonCrawl, NewsCommentary and Rapid, in total 5.9M sentence pairs) as well as the 4.2M sen- tence pairs of synthetic data created in (Sennrich et al, 2016a). Last year's submission is an ensemble of several carefully crafted models using an RNN-encoder and decoder which was trained on the same data plus 6.9M additional synthetic sentences (Peter et al, 2017). We try 20k and 50k merging operations for BPE and find that 50k performs better by 0.5% to 1.0% BLEU.…”
Section: German→englishmentioning
confidence: 99%