Insertion-based Decoding with Automatically Inferred Generation Order

Gu, Jiatao; Liu, Qi; Cho, Kyunghyun

doi:10.1162/tacl_a_00292

Cited by 85 publications

(86 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We train with a simple masking scheme where the number of masked target tokens is distributed uniformly, presenting the model with both easy (single mask) and difficult (completely masked) examples. Unlike recently proposed insertion models (Gu et al, 2019;Stern et al, 2019), which treat each token as a separate training instance, CMLMs can train from the entire sequence in parallel, resulting in much faster training.…”

Section: Introductionmentioning

confidence: 99%

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Ghazvininejad¹,

Levy²,

Liu³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

369

552

View full text Add to dashboard Cite

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for nonautoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

show abstract

Section: Introductionmentioning

confidence: 99%

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Ghazvininejad¹,

Levy²,

Liu³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

369

552

View full text Add to dashboard Cite

show abstract

“…In parallel to the work investigating masked language models for text generation, Welleck et al [74], Stern et al [75] and Gu et al [76] proposed methods for non-monotonic sequential text generation. Although these methods could be applied for generating molecular graphs in flexible ordering, there has not been work empirically validating this.…”

Section: A Related Workmentioning

confidence: 99%

Masked Graph Modeling for Molecule Generation

Mahmood

Mansimov²,

Bonneau³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

De novo, in-silico design of molecules is a challenging problem with applications in drug discovery and material design. We introduce a masked graph model, which learns a distribution over graphs by capturing conditional distributions over unobserved nodes (atoms) and edges (bonds) given observed ones. We train and then sample from our model by iteratively masking and replacing different parts of initialized graphs.<br>We evaluate our approach on the QM9 and ChEMBL datasets using the GuacaMol distribution-learning benchmark. We find that validity, KL-divergence and Fréchet ChemNet Distance scores are anti-correlated with novelty, and that we can trade off between these metrics more effectively than existing models. On distributional metrics, our model outperforms previously proposed graph-based approaches and is competitive with SMILES-based approaches. Finally, we show our model generates molecules with desired values of specified properties while maintaining physiochemical similarity to the<br>training distribution.

show abstract

“…In recent work, several insertion-based frameworks have been proposed for the generation of sequences in a non-left-to-right fashion for machine translation (Stern et al, 2019;Welleck et al, 2019;Gu et al, 2019). Stern et al (2019) and balanced binary tree orders.…”

Section: Related Workmentioning

confidence: 99%

“…More recently, a number of novel insertionbased architectures have been developed for sequence generation (Gu et al, 2019;Stern et al, 2019;Welleck et al, 2019). These frameworks license a diverse set of generation orders, including uniform (Welleck et al, 2019), random (Gu et al, 2019), or balanced binary trees (Stern et al, 2019). Some of them also match the quality of state-ofthe-art left-to-right models (Stern et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

An Empirical Study of Generation Order for Machine Translation

Chan

Stern

Kiros

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft orderreward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, locationbased orders, frequency-based orders, contentbased orders, and model-based orders. Curiously, we find that for the WMT'14 English → German and WMT'18 English → Chinese translation tasks, order does not have a substantial impact on output quality. Moreover, for English → German, we even discover that unintuitive orderings such as alphabetical and shortest-first can match the performance of a standard Transformer, suggesting that traditional left-to-right generation may not be necessary to achieve high performance.

show abstract

Insertion-based Decoding with Automatically Inferred Generation Order

Cited by 85 publications

References 30 publications

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Masked Graph Modeling for Molecule Generation

An Empirical Study of Generation Order for Machine Translation

Contact Info

Product

Resources

About