Training language models to follow instructions with human feedback

Ouyang, Long; Wu, Jeff; Xu, Jiang; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; John, Sabu; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe, Ryan J.

doi:10.48550/arxiv.2203.02155

Cited by 395 publications

(507 citation statements)

References 65 publications

(140 reference statements)

Supporting

Mentioning

349

Contrasting

Unclassified

Order By: Relevance

“…Looking forward, we expect our model performance to continue to increase with more parameters, data, and training steps [39,32]. Moreover, fine-tuning would allow our models to be better able to condition on natural language instructions and other indications of human intent [76,65,48]. Finally, our model lays a foundation for future work on supervised infilling & editing via model fine-tuning, as well as performing iterative decoding, where the model can be used to refine its own output [27].…”

Section: Discussionmentioning

confidence: 97%

InCoder: A Generative Model for Code Infilling and Synthesis

Fried¹,

Aghajanyan²,

Lin³

et al. 2022

Preprint

View full text Add to dashboard Cite

Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce INCODER, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first large generative code model that is able to infill arbitrary regions of code, which we evaluate in a zero-shot setting on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The INCODER models and code are publicly released.2 * Equal contribution 2 https://sites.google.com/view/incoder-code-models

show abstract

Section: Discussionmentioning

confidence: 97%

InCoder: A Generative Model for Code Infilling and Synthesis

Fried¹,

Aghajanyan²,

Lin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For this reason, preference learning, uncertainty modeling and value alignment (Russell, 2019) are especially important for the design of humancompatible generalist agents. It may be possible to extend some of the value alignment approaches for language (Kenton et al, 2021;Ouyang et al, 2022) to generalist agents. However, even as technical solutions are developed for value alignment, generalist systems could still have negative societal impacts even with the intervention of well-intentioned designers, due to unforeseen circumstances or limited oversight (Amodei et al, 2016).…”

Section: Broader Impactmentioning

confidence: 99%

A Generalist Agent

Reed¹,

Żołna²,

Parisotto³

et al. 2022

Preprint

View full text Add to dashboard Cite

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

show abstract

“…Wei et al (2021) fine tuned Google's internal 137B parameter pretrained LM on their curated suite of 60 datasets, producing a multi-tasked model called FLAN. Min et al (2021) fine tuned the 770M parameter GPT2 (Radford et al, 2019) on a curated suite of 142 datasets, and Ouyang et al (2022) fine tuned the 175B parameter GPT3 (Brown et al, 2020) on disparate datasets of human instructions, using reinforcement learning from human feedback, producing a new multi-tasked InstructGPT model.…”

Section: Input-dependent Prompt Tuning For Multi-tasking a Frozen Lmmentioning

confidence: 99%

“…A side effect of doing so is that performance degrades significantly on other tasks. Partly in response, considerable recent work has been devoted to fine tuning huge LMs simultaneously on many (in some cases, over 100) curated NLP tasks (Sanh et al, 2021;Wei et al, 2021;Min et al, 2021;Aribandi et al, 2021;Ouyang et al, 2022). These formidable efforts have been effective in the sense that they have produced models that exhibit high performance on inputs taken from any of the curated tasks, and, indeed, from other similar tasks.…”

Section: Introductionmentioning

confidence: 99%

Standing on the Shoulders of Giant Frozen Language Models

Levine¹,

Dalmedigos²,

Ram³

et al. 2022

Preprint

View full text Add to dashboard Cite

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM-i.e., leaving its weights untouched-still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozenmodel techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.

show abstract

Training language models to follow instructions with human feedback

Cited by 395 publications

References 65 publications

InCoder: A Generative Model for Code Infilling and Synthesis

InCoder: A Generative Model for Code Infilling and Synthesis

A Generalist Agent

Standing on the Shoulders of Giant Frozen Language Models

Contact Info

Product

Resources

About