Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1336
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

Abstract: Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel nonautoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We condu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
108
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 116 publications
(141 citation statements)
references
References 12 publications
0
108
0
Order By: Relevance
“…We compare our approach to three other parallel decoding translation methods: the fertility-based sequence-to-sequence model of Gu et al (2018), the CTC-loss transformer of Libovický and Helcl (2018), and the iterative refinement approach of . The first two methods are purely non-autoregressive, while the iterative refinement approach is only non-autoregressive in the first decoding iteration, similar to our approach.…”
Section: Translation Qualitymentioning
confidence: 99%
See 1 more Smart Citation
“…We compare our approach to three other parallel decoding translation methods: the fertility-based sequence-to-sequence model of Gu et al (2018), the CTC-loss transformer of Libovický and Helcl (2018), and the iterative refinement approach of . The first two methods are purely non-autoregressive, while the iterative refinement approach is only non-autoregressive in the first decoding iteration, similar to our approach.…”
Section: Translation Qualitymentioning
confidence: 99%
“…Gu et al (2018) introduce a transformer-based approach with explicit word fertility, and identify the multi-modality problem. Libovický and Helcl (2018) approach the multimodality problem by collapsing repetitions with the Connectionist Temporal Classification training objective (Graves et al, 2006). Perhaps most similar to our work is the iterative refinement approach of , in which the model corrects the original non-autoregressive prediction by passing it multiple times through a denoising autoencoder.…”
Section: Parallel Decoding For Machine Translationmentioning
confidence: 99%
“…We first conduct experiments to compare the performance of FlowSeq with strong baseline models, including NAT w/ Fertility (Gu et al, 2018), NAT-IR , NAT-REG (Wang et al, 2019), LV NAR (Shu et al, 2019), CTC Loss (Libovickỳ and Helcl, 2018), and CMLM (Ghazvininejad et al, 2019). Table 1 provides the BLEU scores of FlowSeq with argmax decoding, together with baselines with purely non-autoregressive decoding methods that generate output sequence in one parallel pass.…”
Section: Resultsmentioning
confidence: 99%
“…Lee et al (2018) proposed a method of iterative refinement based on latent variable model and denoising autoencoder. Libovick and Helcl (2018) take NAT as a connectionist temporal classification problem, which achieved better latency. Kaiser et al (2018) use discrete latent variables that makes decoding much more parallelizable.…”
Section: Ablation Studymentioning
confidence: 99%
“…* This work was done when the first author was on an internship at Tencent. Recently, a line of research work (Gu et al, 2017;Lee et al, 2018;Libovick and Helcl, 2018;Wang et al, 2018) propose to break the autoregressive bottleneck by introducing non-autoregressive neural machine translation (NAT). In NAT, the decoder generates all words simultaneously instead of sequentially.…”
Section: Introductionmentioning
confidence: 99%