Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.464
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of Generation Order for Machine Translation

Abstract: In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft orderreward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, locationbased orders, frequency-based orders, contentbased orders, and model-based orders. Curiously, we find that for the WMT'1… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 7 publications
(9 reference statements)
0
6
0
Order By: Relevance
“…Insertion Transformer has been shown to generate sequences in logarithmic log n iterations (Stern et al, 2019), where as the Imputer requires constant generation steps. Additionally, the Insertion Transformer with its vastly non-monotonic generation order (Chan et al, 2019c) lacks the monoticity that the Imputer dynamic programming endows, consequently we found the Insertion Transformer difficult to apply to speech recognition applications.…”
Section: Related Workmentioning
confidence: 96%
See 1 more Smart Citation
“…Insertion Transformer has been shown to generate sequences in logarithmic log n iterations (Stern et al, 2019), where as the Imputer requires constant generation steps. Additionally, the Insertion Transformer with its vastly non-monotonic generation order (Chan et al, 2019c) lacks the monoticity that the Imputer dynamic programming endows, consequently we found the Insertion Transformer difficult to apply to speech recognition applications.…”
Section: Related Workmentioning
confidence: 96%
“…Recently, there has been a growing interest in models that make a trade-off between the two extremes of fully autoregressive and fully non-autoregressive generation, such as the Insertion Transformer (Stern et al, 2019), Mask-Predict (Ghazvininejad et al, 2019), Levenstein Transformer (Gu et al, 2019b) and Multilingual KER-MIT (Chan et al, 2019a). Such models sacrifice almost no performance, while requiring a logarithmic (Chan et al, 2019c) or a constant number of generation steps (Lee et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…There has been significant prior work on nonautoregressive iterative methods for machine translation (Gu et al, 2018), some of which are: iterative refinement (Lee et al, 2018), insertionbased methods Chan et al, 2019a;Li and Chan, 2019), and conditional masked language models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b. Like insertion-based models Chan et al, 2019c), our work does not commit to a fixed target length; insertionbased models can dynamically grow the canvas size, whereas our work which relies on a latent alignment can only generate a target sequence up to a fixed maximum predetermined length. Compared to conditional masked languages models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b, key differences are: 1) our models do not require target length prediction, and 2) we eschew the encoderdecoder neural architecture formulation, but rather rely on the single simple decoder architecture.…”
Section: Related Workmentioning
confidence: 99%
“…The generation is sometimes not fluent, as multiple tokens may compete for the same meaning. The insertion-based methods (Stern et al, 2019;Gu et al, 2019a;Welleck et al, 2019;Chan et al, 2020;Zhang et al, 2020b) also change the standard left-to-right generation by allowing dynamically inserting tokens for the generation process. This provides a good balance between generation fluency and efficiency, and does not require predicting target lengths first.…”
Section: Related Workmentioning
confidence: 99%
“…For machine translation, we utilize the widely used WMT 2014 English-German translation dataset (Bojar et al, 2014), with newstest2013 as the development and new-stest2014 as the test set. Following previous work (Stern et al, 2019;Chan et al, 2020), we apply sequence-level knowledge distillation (Hinton et al, 2015;Kim and Rush, 2016) from a left-toright autoregressive model, which has been found helpful to reduce data complexity and improve the performance of NAT models (Zhou et al, 2020).…”
Section: A Dataset Detailsmentioning
confidence: 99%