Scaling Neural Machine Translation

Ott, Myle; Edunov, Sergey; Grangier, David; Auli, Michael

doi:10.18653/v1/w18-6301

Cited by 526 publications

(511 citation statements)

References 30 publications

Supporting

Mentioning

474

Contrasting

Unclassified

Order By: Relevance

“…However, Guillou (2013); Carpuat and Simard (2012) find that translations generated by a machine translation system tend to be similarly or more lexically consistent, as measured by a similar metric, than human ones. This even holds for sentence-level systems, where the increased consistency is not due to improved co-hesion, but accidental - Ott et al (2018) show that beam search introduces a bias towards frequent words, which could be one factor explaining this finding. This means that a higher repetition rate does not mean that a translation system is in fact more cohesive, and we find that even our baseline is more repetitive than the human reference.…”

Section: Additional Related Workmentioning

confidence: 95%

When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion

Voita¹,

Sennrich²,

Titov³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

145

185

View full text Add to dashboard Cite

Though machine translation errors caused by the lack of context beyond one sentence have long been acknowledged, the development of context-aware NMT systems is hampered by several problems. Firstly, standard metrics are not sensitive to improvements in consistency in document-level translations. Secondly, previous work on context-aware NMT assumed that the sentence-aligned parallel data consisted of complete documents while in most practical scenarios such document-level data constitutes only a fraction of the available parallel data. To address the first issue, we perform a human study on an English-Russian subtitles dataset and identify deixis, ellipsis and lexical cohesion as three main sources of inconsistency. We then create test sets targeting these phenomena. To address the second shortcoming, we consider a set-up in which a much larger amount of sentence-level data is available compared to that aligned at the document level. We introduce a model that is suitable for this scenario and demonstrate major gains over a context-agnostic baseline on our new benchmarks without sacrificing performance as measured with BLEU. 1 1. ambiguity arising when imperative and indicative verb forms are the same,

show abstract

Section: Additional Related Workmentioning

confidence: 95%

When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion

Voita¹,

Sennrich²,

Titov³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

145

185

View full text Add to dashboard Cite

show abstract

“…This is 0.6 BLEU points higher than the pre-norm Transformer-Big. It should be noted that although our best score of 29.3 is the same as Ott et al (2018), our approach only requires 3.5X fewer training epochs than theirs.…”

Section: In Bothmentioning

confidence: 99%

Learning Deep Transformer Models for Machine Translation

Wang¹,

Li²,

Xiao³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

461

214

View full text Add to dashboard Cite

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English-German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4∼2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big 1 . * Corresponding author. 1 The source code is available at https://github. com/wangqiangneu/dlcl

show abstract

“…Currently, existing Transformer-based speech applications [2]- [4] still lack an open source toolkit and reproducible experiments while previous studies in NMT [5], [6] provide them. Therefore, we work on an open community-driven project for end-to-end speech applications using both Transformer and RNN by following the success of Kaldi for hidden Markov model (HMM)-based ASR [7].…”

Section: Introductionmentioning

confidence: 99%

A Comparative Study on Transformer vs RNN in Speech Applications

Karita¹,

Chen²,

Hayashi³

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

611

419

View full text Add to dashboard Cite

Sequence-to-sequence models have been widely used in end-toend speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS). This paper focuses on an emergent sequence-to-sequence model called Transformer, which achieves state-of-the-art performance in neural machine translation and other natural language processing applications. We undertook intensive studies in which we experimentally compared and analyzed Transformer and conventional recurrent neural networks (RNN) in a total of 15 ASR, one multilingual ASR, one ST, and two TTS benchmarks. Our experiments revealed various training tips and significant performance benefits obtained with Transformer for each task including the surprising superiority of Transformer in 13/15 ASR benchmarks in comparison with RNN. We are preparing to release Kaldi-style reproducible recipes using open source and publicly available datasets for all the ASR, ST, and TTS tasks for the community to succeed our exciting outcomes.

show abstract

Scaling Neural Machine Translation

Cited by 526 publications

References 30 publications

When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion

When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion

Learning Deep Transformer Models for Machine Translation

A Comparative Study on Transformer vs RNN in Speech Applications

Contact Info

Product

Resources

About