Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models

West, Peter; Lu, Ximing; Holtzman, Ari; Bhagavatula, Chandra; Hwang, Jena D.; Choi, Yejin

doi:10.18653/v1/2021.acl-long.114

Cited by 3 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unfortunately, standard left-to-right language models cannot directly infill text, and popular masked language models are mainly trained to infill very short spans [17,22,52,54]. Recent work addresses this by changing model architectures, inference procedures, and training objectives [2,59,67,3]. Most related to our approach is the work of Donahue et al [23] and CM3 [2], who train left-to-right language models to fill in masked token segments of varying lengths; and the work of Alon et al [7], who train an infilling-capable, AST-structured generative model of code on a smaller scale.…”

Section: Related Workmentioning

confidence: 99%

InCoder: A Generative Model for Code Infilling and Synthesis

Fried¹,

Aghajanyan²,

Lin³

et al. 2022

Preprint

View full text Add to dashboard Cite

Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce INCODER, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first large generative code model that is able to infill arbitrary regions of code, which we evaluate in a zero-shot setting on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The INCODER models and code are publicly released.2 * Equal contribution 2 https://sites.google.com/view/incoder-code-models

show abstract

Section: Related Workmentioning

confidence: 99%

InCoder: A Generative Model for Code Infilling and Synthesis

Fried¹,

Aghajanyan²,

Lin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In Figure 2 (c), we quantify the exposure bias problem through human judgements. We first sample 200 ELI5 test set questions and generate answers of various lengths {80, 100, ..., 260} (260 is the average sequence length in training set) with beam search, sampling, reflective (West et al, 2021), and KID. We then ask humans to rate these generations with 7-point Likert scoring (Joshi et al, 2015) how likely the generated text is a natural sentence.…”

Section: Eli5 Wowmentioning

confidence: 99%

“…None of these decoding algorithms consider integrating knowledge in the generation process. Reflective decoding (West et al, 2021) and DeLorean (Qin et al, 2020) are two recent decoding algorithms that focus on abductive commonsense reasoning. Reflective decoding in particular has the potential to be extended to other knowledge-intensive tasks.…”

Section: Introductionmentioning

confidence: 99%

“…Besides RAG(Lewis et al, 2020b), we consider several competitive prior methods incorporating knowledge including a) GPT2 + Knowledge(Guan et al, 2020), which post-trains GPT2 on triplets-converted augmentation data (e.g., ⟨helium, is, gas⟩ → "helium is gas. "); b) GPT2 + COMeT Embeddings, which fuses knowledge-aware embeddings (e.g., from CoMET(Bosselut et al, 2019); c) FiD-T5 (Izacard & Grave, 2021b;a), which concatenates retrieved grounding documents with context as new input; d) QA-GNN(Yasunaga et al, 2021), which traverses a graph neural network; e) Reflective decoding(West et al, 2021), which relies on forward and backward LMs to encode bi-directional context for generation.As shown in Table2(columns with 100% training data), KID outperforms all other methods in ELI5 (with 5.2 ROUGE-L points improvements over the second-best method (RAG)), and achieves competitive results requiring neither specific model architecture nor additional training to infuse knowledge.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Knowledge Infused Decoding

Liu¹,

Zheng²,

Gupta³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID)-a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledgeintensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences. Code for KID is available at https:// github.com/microsoft/KID.

show abstract