Insertion Transformer: Flexible Sequence Generation via Insertion Operations

Stern, Mitchell; Chan, William; Kiros, Jamie; Uszkoreit, Jakob

doi:10.48550/arxiv.1902.03249

Cited by 18 publications

(39 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Future work, might consider selecting more than one elements from N (x t ) to expand at each time, instead of a single i t . This will amortize the encoding cost among multiple non-terminal expansions, similar to Welleck et al [37], Stern et al [32].…”

Section: Practical Considerationsmentioning

confidence: 98%

“…However, for general-purpose programming language such a method is prohibitive. Recently, sequence generation approaches that go beyond the left-to-right paradigm have been proposed [37,32,11,10,17,30], usually by considering generation as an iterative refinement procedure that changes or extends a sequence in every iteration. These models often aim in speeding-up inference or allowing models to figure a better order for generating a full sentence (of terminal tokens).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Complete Code with Sketches

Guo¹,

Svyatkovskiy²,

Yin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Traditional generative models are limited to predicting sequences of terminal tokens. However, ambiguities in the generation task may lead to incorrect outputs. Towards addressing this, we introduce GRAMMFORMERs, transformer-based grammarguided models that learn (without explicit supervision) to generate sketches -sequences of tokens with holes. Through reinforcement learning, GRAMMFORMERs learn to introduce holes avoiding the generation of incorrect tokens where there is ambiguity in the target task. We train GRAMMFORMERs for statement-level source code completion, i.e. the generation of code snippets given an ambiguous user intent, such as a partial code context. We evaluate GRAMMFORMERs on code completion for C# and Python and show that it generates 10-50% more accurate sketches compared to traditional generative models and 37-50% longer sketches compared to sketch-generating baselines trained with similar techniques.Preprint. Under review.

show abstract

Section: Practical Considerationsmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Learning to Complete Code with Sketches

Guo¹,

Svyatkovskiy²,

Yin³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…There has been extensive interest in non-autoregressive/parallel generation approaches, aiming at producing a sequence in parallel sub-linear time w.r.t. sequence length [13,52,26,65,53,14,11,12,48,15,28,16,49,55,30,41,64,62]. Existing approaches can be broadly classified as latent variable based [13,26,65,28,41], refinement-based [25,48,14,15,11,30,12,62] or a combination of both [41].…”

Section: Related Workmentioning

confidence: 99%

“…sequence length [13,52,26,65,53,14,11,12,48,15,28,16,49,55,30,41,64,62]. Existing approaches can be broadly classified as latent variable based [13,26,65,28,41], refinement-based [25,48,14,15,11,30,12,62] or a combination of both [41].…”

Section: Related Workmentioning

confidence: 99%

“…Another line of research uses refinement-based methods, where the model learns to iteratively refine a partially/fully completed hypothesis. Training usually takes the form of masked language modeling [11,12] or imitating hand-crafted refinement policies [25,48,15]. Refinement-based approaches can sometimes reach better performance after multiple forward passes compared to latent variable based approaches which mostly use a single forward pass [15,11,41].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Cascaded Text Generation with Markov Transformers

Deng,

Rush

2020

Preprint

View full text Add to dashboard Cite

The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.Preprint. Under review.

show abstract

Insertion-based Decoding with Automatically Inferred Generation Order

Liu

Cho

2019

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm -InDIGO -which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared to the conventional left-toright generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information. * This work was completed while the author worked as an AI resident at Facebook AI Research.

show abstract

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

Cited by 18 publications

References 0 publications

Learning to Complete Code with Sketches

Learning to Complete Code with Sketches

Cascaded Text Generation with Markov Transformers

Insertion-based Decoding with Automatically Inferred Generation Order

Contact Info

Product

Resources

About