Proceedings of the Third Conference on Machine Translation: Research Papers 2018
DOI: 10.18653/v1/w18-6301
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Neural Machine Translation

Abstract: Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. 1 On WMT'14 English-German translation, we match the accuracy of Vaswani et al. (2017) in under 5 hours when training on 8 GPUs and we obtain a new state of the art of 29.3 BLEU after training for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

11
474
1
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 526 publications
(511 citation statements)
references
References 30 publications
11
474
1
1
Order By: Relevance
“…However, Guillou (2013); Carpuat and Simard (2012) find that translations generated by a machine translation system tend to be similarly or more lexically consistent, as measured by a similar metric, than human ones. This even holds for sentence-level systems, where the increased consistency is not due to improved co-hesion, but accidental - Ott et al (2018) show that beam search introduces a bias towards frequent words, which could be one factor explaining this finding. This means that a higher repetition rate does not mean that a translation system is in fact more cohesive, and we find that even our baseline is more repetitive than the human reference.…”
Section: Additional Related Workmentioning
confidence: 95%
“…However, Guillou (2013); Carpuat and Simard (2012) find that translations generated by a machine translation system tend to be similarly or more lexically consistent, as measured by a similar metric, than human ones. This even holds for sentence-level systems, where the increased consistency is not due to improved co-hesion, but accidental - Ott et al (2018) show that beam search introduces a bias towards frequent words, which could be one factor explaining this finding. This means that a higher repetition rate does not mean that a translation system is in fact more cohesive, and we find that even our baseline is more repetitive than the human reference.…”
Section: Additional Related Workmentioning
confidence: 95%
“…This is 0.6 BLEU points higher than the pre-norm Transformer-Big. It should be noted that although our best score of 29.3 is the same as Ott et al (2018), our approach only requires 3.5X fewer training epochs than theirs.…”
Section: In Bothmentioning
confidence: 99%
“…Currently, existing Transformer-based speech applications [2]- [4] still lack an open source toolkit and reproducible experiments while previous studies in NMT [5], [6] provide them. Therefore, we work on an open community-driven project for end-to-end speech applications using both Transformer and RNN by following the success of Kaldi for hidden Markov model (HMM)-based ASR [7].…”
Section: Introductionmentioning
confidence: 99%