The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Chen, Mia Xu; Fırat, Orhan; Bapna, Ankur; Johnson, Melvin; Macherey, Wolfgang; Foster, George; Jones, Llion; Schuster, Mike; Shazeer, Noam; Parmar, Niki; Vaswani, Ashish; Uszkoreit, Jakob; Kaiser, Łukasz; Chen, Zhifeng; Wu, Yonghui; Hughes, Macduff

doi:10.18653/v1/p18-1008

Cited by 297 publications

(206 citation statements)

References 29 publications

(23 reference statements)

Supporting

Mentioning

194

Contrasting

Order By: Relevance

“…There are many works we are yet to explore. For example, our experiments did not show to what extent transformer's superior performance comes from replacing recurrence with self-attention, while other modeling techniques from transformer can be borrowed to improve RNNs as well [42]. The quadratically growing cost with respect to the length of speech signals is still a major blocker for transformer-based acoustic models to be used in practice.…”

Section: Discussionmentioning

confidence: 85%

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

Wang¹,

Mohamed²,

Le³

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

223

166

View full text Add to dashboard Cite

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We demonstrate that on the widely used Librispeech benchmark, our transformer-based AM outperforms the best published hybrid result by 19% to 26% relative when the standard n-gram language model (LM) is used. Combined with neural network LM for rescoring, our proposed approach achieves state-of-the-art results on Librispeech. Our findings are also confirmed on a much larger internal dataset.

show abstract

Section: Discussionmentioning

confidence: 85%

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

Wang¹,

Mohamed²,

Le³

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

223

166

View full text Add to dashboard Cite

show abstract

“…Our streaming and re-translation models are implemented in Lingvo (Shen et al, 2019), sharing architecture and hyper-parameters wherever possible. Our RNMT+ architecture (Chen et al, 2018) consists of a 6 layer LSTM encoder and an 8 layer Table 2: An example of proportional prefix training. Each example in the minibatch has a 50% chance to be truncated, in which case, we truncate its source and target to a randomly-selected fraction of their original lengths, 1/3 in this example.…”

Section: Modelsmentioning

confidence: 99%

Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation

Arivazhagan

Cherry

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find retranslation to be as good or better than stateof-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation's ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model.

show abstract

“…We employ frequent casing for the IWSLT tasks while lowercase for the LibriSpeech. There, the evaluation of the IWSLT En→De is case-sensitive, while that of the LibriSpeech is case-insensitive 4 . The translation models are evaluated using the official scripts of WMT campaign, i.e.…”

Section: Pre-trainingmentioning

confidence: 99%

“…The success of deep neural network (DNN) in both machine translation (MT) [1][2][3][4][5] and automatic speech recognition (ASR) [6][7][8][9][10] has inspired the work of end-to-end speech to text translation (ST) systems. The traditional ST methods are based on a consecutive cascaded pipeline of ASR and MT systems.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Comparative Study on End-to-End Speech to Text Translation

Bahar

Bieschke

Ney

2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pretrained models, and their impact on the final performance, which gives boosts up to 4% in BLEU and 5% in TER. Our experiments are performed on 270h IWSLT TED-talks En→De, and 100h LibriSpeech Audiobooks En→Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks.

show abstract

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Cited by 297 publications

References 29 publications

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation

A Comparative Study on End-to-End Speech to Text Translation

Contact Info

Product

Resources

About