RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

Deng, Mingkai; Wang, Yunlong; Hsieh, Cheng-Ping; Wang, Yihan; Han, Guohong; Shu, Tianmin; Meng, Sheng; Xing, Eric P.; Hu, Zhiting

doi:10.48550/arxiv.2205.12548

Cited by 16 publications

(29 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…5.The importance of fluid exchange between artificial and human intelligence in this paradigm is evinced by the rapidly growing interest in prompt engineering , i.e., an increasingly self-aware and theory-driven approach to the role that prompts play in co-creating the outputs of these types of systems (Liu et al, 2022), which has recently been extended to the optimization of text prompts by distinct AI agents (Deng et al, 2022). …”

mentioning

confidence: 99%

Designing ecosystems of intelligence from first principles

Friston,

Ramstead,

Kiefer

et al. 2024

Collective Intelligence

View full text Add to dashboard Cite

This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants—what we call “shared intelligence.” This vision is premised on active inference, a formulation of adaptive behavior that can be read as a physics of intelligence, and which inherits from the physics of self-organization. In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one’s sensed world—also known as self-evidencing. Formally, this corresponds to maximizing (Bayesian) model evidence, via belief updating over several scales, that is, inference, learning, and model selection. Operationally, this self-evidencing can be realized via (variational) message passing or belief propagation on a factor graph. Crucially, active inference foregrounds an existential imperative of intelligent systems; namely, curiosity or the resolution of uncertainty. This same imperative underwrites belief sharing in ensembles of agents, in which certain aspects (i.e., factors) of each agent’s generative world model provide a common ground or frame of reference. Active inference plays a foundational role in this ecology of belief sharing—leading to a formal account of collective intelligence that rests on shared narratives and goals. We also consider the kinds of communication protocols that must be developed to enable such an ecosystem of intelligences and motivate the development of a shared hyper-spatial modeling language and transaction protocol, as a first—and key—step towards such an ecology.

show abstract

mentioning

confidence: 99%

Designing ecosystems of intelligence from first principles

Friston,

Ramstead,

Kiefer

et al. 2024

Collective Intelligence

View full text Add to dashboard Cite

show abstract

“…Furthermore, our work can also be viewed from the perspective of learning discrete prompts for language models. Past work propose to generate knowledge pieces (Liu et al, 2022) or arbitrary textual snippets (Deng et al, 2022) which they append to the input via reinforcement learning. These works are different than ours in that their policy is conditioned solely on the input x whereas in our case we sample critiques of machine-generated predictions based on x and ŷ.…”

Section: Adapters and Discrete Prompt Learningmentioning

confidence: 99%

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Akyürek¹,

Ekin²,

Kalyan³

et al. 2023

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Despite their unprecedented success, even the largest language models make mistakes. Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply to black-box or limited access models such as ChatGPT, as they cannot be fine-tuned. Moreover, in the era of large general-purpose language agents, fine-tuning is neither computationally nor spatially efficient as it results in multiple copies of the network. In this work, we introduce RL4F (Reinforcement Learning for Feedback), a multi-agent collaborative framework where the critique generator is trained to maximize end-task performance of GPT-3, a fixed model more than 200 times its size. RL4F produces critiques that help GPT-3 revise its outputs. We study three datasets for action planning, summarization and alphabetization and show relative improvements up to 10% in multiple text similarity metrics over other learned, retrievalaugmented or prompting-based critique generators. 1

show abstract

“…For example, CLIP [81] adopts linear probing [12,31,32,109] and full-finetuning [25,31,48,99,101,109] when transferring to downstream tasks. Prompt adaptation of CLIP [63,81,105,112,114] is motivated by the success of prefix-tuning for language models [16,22,30,45,61,78,84,85,89]. Similarly, CLIP-Adapter [21] and Tip-Adapter [111] are inspired by parameter-efficient finetuning methods [39,44,110] that optimize lightweight MLPs while freezing the encoder.…”

Section: Related Workmentioning

confidence: 99%

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Zhang¹,

Yu²,

Kuang³

et al. 2023

Preprint

View full text Add to dashboard Cite

The ability to quickly learn a new task with minimal instruction -known as few-shot learning -is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better visual dog classifier by reading about dogs and listening to them bark. To do so, we exploit the fact that recent multimodal foundation models such as CLIP are inherently cross-modal, mapping different modalities to the same representation space. Specifically, we propose a simple cross-modal adaptation approach that learns from few-shot examples spanning different modalities. By repurposing class names as additional one-shot training samples, we achieve SOTA results with an embarrassingly simple linear classifier for vision-language adaptation. Furthermore, we show that our approach can benefit existing methods such as prefix tuning, adapters, and classifier ensembling. Finally, to explore other modalities beyond vision and language, we construct the first (to our knowledge) audiovisual few-shot benchmark and use crossmodal training to improve the performance of both image and audio classification. Project site at link.

show abstract

RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

Cited by 16 publications

References 48 publications

Designing ecosystems of intelligence from first principles

Designing ecosystems of intelligence from first principles

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Contact Info

Product

Resources

About