An Empirical Study of Generation Order for Machine Translation

Chan, William; Stern, Mitchell; Kiros, Jamie; Uszkoreit, Jakob

doi:10.18653/v1/2020.emnlp-main.464

Cited by 4 publications

(6 citation statements)

References 7 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Insertion Transformer has been shown to generate sequences in logarithmic log n iterations (Stern et al, 2019), where as the Imputer requires constant generation steps. Additionally, the Insertion Transformer with its vastly non-monotonic generation order (Chan et al, 2019c) lacks the monoticity that the Imputer dynamic programming endows, consequently we found the Insertion Transformer difficult to apply to speech recognition applications.…”

Section: Related Workmentioning

confidence: 96%

“…Recently, there has been a growing interest in models that make a trade-off between the two extremes of fully autoregressive and fully non-autoregressive generation, such as the Insertion Transformer (Stern et al, 2019), Mask-Predict (Ghazvininejad et al, 2019), Levenstein Transformer (Gu et al, 2019b) and Multilingual KER-MIT (Chan et al, 2019a). Such models sacrifice almost no performance, while requiring a logarithmic (Chan et al, 2019c) or a constant number of generation steps (Lee et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Imputer: Sequence Modelling via Imputation and Dynamic Programming

Chan,

Saharia,

Hinton

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. The Imputer can be trained to approximately marginalize over all possible alignments between the input and output sequences, and all possible generation orders. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood. When applied to end-to-end speech recognition, the Imputer outperforms prior non-autoregressive models and achieves competitive results to autoregressive models. On LibriSpeech test-other, the Imputer achieves 11.1 WER, outperforming CTC at 13.0 WER and seq2seq at 12.5 WER.

show abstract

Section: Related Workmentioning

confidence: 96%

Section: Introductionmentioning

confidence: 99%

Imputer: Sequence Modelling via Imputation and Dynamic Programming

Chan,

Saharia,

Hinton

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…There has been significant prior work on nonautoregressive iterative methods for machine translation (Gu et al, 2018), some of which are: iterative refinement (Lee et al, 2018), insertionbased methods Chan et al, 2019a;Li and Chan, 2019), and conditional masked language models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b. Like insertion-based models Chan et al, 2019c), our work does not commit to a fixed target length; insertionbased models can dynamically grow the canvas size, whereas our work which relies on a latent alignment can only generate a target sequence up to a fixed maximum predetermined length. Compared to conditional masked languages models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b, key differences are: 1) our models do not require target length prediction, and 2) we eschew the encoderdecoder neural architecture formulation, but rather rely on the single simple decoder architecture.…”

Section: Related Workmentioning

confidence: 99%

Non-Autoregressive Machine Translation with Latent Alignments

Saharia¹,

Chan²,

Saxena³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

This paper presents two strong methods, CTC and Imputer, for non-autoregressive machine translation that model latent alignments with dynamic programming. We revisit CTC for machine translation and demonstrate that a simple CTC model can achieve state-of-theart for single-step non-autoregressive machine translation, contrary to what prior work indicates. In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline. Our latent alignment models are simpler than many existing non-autoregressive translation baselines; for example, we do not require target length prediction or re-scoring with an autoregressive model. On the competitive WMT'14 En→De task, our CTC model achieves 25.7 BLEU with a single generation step, while Imputer achieves 27.5 BLEU with 2 generation steps, and 28.0 BLEU with 4 generation steps. This compares favourably to the autoregressive Transformer baseline at 27.8 BLEU. * Equal contribution. † Work done as part of the Google AI Residency.

show abstract

“…The generation is sometimes not fluent, as multiple tokens may compete for the same meaning. The insertion-based methods (Stern et al, 2019;Gu et al, 2019a;Welleck et al, 2019;Chan et al, 2020;Zhang et al, 2020b) also change the standard left-to-right generation by allowing dynamically inserting tokens for the generation process. This provides a good balance between generation fluency and efficiency, and does not require predicting target lengths first.…”

Section: Related Workmentioning

confidence: 99%

“…For machine translation, we utilize the widely used WMT 2014 English-German translation dataset (Bojar et al, 2014), with newstest2013 as the development and new-stest2014 as the test set. Following previous work (Stern et al, 2019;Chan et al, 2020), we apply sequence-level knowledge distillation (Hinton et al, 2015;Kim and Rush, 2016) from a left-toright autoregressive model, which has been found helpful to reduce data complexity and improve the performance of NAT models (Zhou et al, 2020).…”

Section: A Dataset Detailsmentioning

confidence: 99%

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Zhang,

Dolan

2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an attractive alternative that allows outputting multiple tokens in a single generation step. Nevertheless, due to the incompatibility between absolute positional encoding and insertion-based generation schemes, it needs to refresh the encoding of every token in the generated partial hypothesis at each step, which could be costly. We design a novel reusable positional encoding scheme for Insertion Transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps. Empirical studies on various text generation tasks demonstrate the effectiveness of FPE, which leads to floating-point operation reduction and latency improvements on batched decoding.

show abstract

An Empirical Study of Generation Order for Machine Translation

Cited by 4 publications

References 7 publications

Imputer: Sequence Modelling via Imputation and Dynamic Programming

Imputer: Sequence Modelling via Imputation and Dynamic Programming

Non-Autoregressive Machine Translation with Latent Alignments

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Contact Info

Product

Resources

About