Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018
DOI: 10.18653/v1/p18-1008
|View full text |Cite
|
Sign up to set email alerts
|

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Abstract: The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then outperformed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new archi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
194
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 297 publications
(206 citation statements)
references
References 29 publications
(23 reference statements)
6
194
0
Order By: Relevance
“…There are many works we are yet to explore. For example, our experiments did not show to what extent transformer's superior performance comes from replacing recurrence with self-attention, while other modeling techniques from transformer can be borrowed to improve RNNs as well [42]. The quadratically growing cost with respect to the length of speech signals is still a major blocker for transformer-based acoustic models to be used in practice.…”
Section: Discussionmentioning
confidence: 85%
“…There are many works we are yet to explore. For example, our experiments did not show to what extent transformer's superior performance comes from replacing recurrence with self-attention, while other modeling techniques from transformer can be borrowed to improve RNNs as well [42]. The quadratically growing cost with respect to the length of speech signals is still a major blocker for transformer-based acoustic models to be used in practice.…”
Section: Discussionmentioning
confidence: 85%
“…Our streaming and re-translation models are implemented in Lingvo (Shen et al, 2019), sharing architecture and hyper-parameters wherever possible. Our RNMT+ architecture (Chen et al, 2018) consists of a 6 layer LSTM encoder and an 8 layer Table 2: An example of proportional prefix training. Each example in the minibatch has a 50% chance to be truncated, in which case, we truncate its source and target to a randomly-selected fraction of their original lengths, 1/3 in this example.…”
Section: Modelsmentioning
confidence: 99%
“…We employ frequent casing for the IWSLT tasks while lowercase for the LibriSpeech. There, the evaluation of the IWSLT En→De is case-sensitive, while that of the LibriSpeech is case-insensitive 4 . The translation models are evaluated using the official scripts of WMT campaign, i.e.…”
Section: Pre-trainingmentioning
confidence: 99%
“…The success of deep neural network (DNN) in both machine translation (MT) [1][2][3][4][5] and automatic speech recognition (ASR) [6][7][8][9][10] has inspired the work of end-to-end speech to text translation (ST) systems. The traditional ST methods are based on a consecutive cascaded pipeline of ASR and MT systems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation