Reframing Instructional Prompts to GPTk's Language

Mishra, Swaroop; Khashabi, Daniel; Baral, Chitta; Choi, Yejin; Hajishirzi, Hannaneh

doi:10.48550/arxiv.2109.07830

Cited by 22 publications

(34 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Prior work proposes better ways of formulating the problem (Zhao et al, 2021;Holtzman et al, 2021;Min et al, 2021a), better ways of choosing labeled examples for the demonstrations (Liu et al, 2021;Lu et al, 2021;Rubin et al, 2021), metatraining with an explicit in-context learning objective Min et al, 2021b), and learning to follow instructions as a variant of incontext learning (Mishra et al, 2021b;Efrat and Levy, 2020;Wei et al, 2022;Sanh et al, 2022). At the same time, some work reports brittleness and over-sensitivity for in-context learning (Lu et al, 2021;Zhao et al, 2021;Mishra et al, 2021a).…”

Section: Related Workmentioning

confidence: 99%

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Min¹,

Lyu²,

Holtzman³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Large language models (LMs) are able to incontext learn-perform a new task via inference alone by conditioning on a few inputlabel pairs (demonstrations) and making predictions for new inputs. However, there has been little understanding of how the model learns and which aspects of the demonstrations contribute to end task performance. In this paper, we show that ground truth demonstrations are in fact not required-randomly replacing labels in the demonstrations barely hurts performance, consistently over 12 different models including GPT-3. Instead, we find that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of (1) the label space, (2) the distribution of the input text, and (3) the overall format of the sequence. Together, our analysis provides a new way of understanding how and why in-context learning works, while opening up new questions about how much can be learned from large language models through inference alone.

show abstract

Section: Related Workmentioning

confidence: 99%

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Min¹,

Lyu²,

Holtzman³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The release of GPT-3 (Brown et al, 2020) sparked a lot of excitement about the emergent ability of LMs in following discrete natural language prompts. Consequently, countless follow-up studies have used manually-designed discrete prompts for probing LMs (Petroni et al, 2019;Jiang et al, 2020), improving LMs few-shot ability (Schick and Schütze, 2021;Gao et al, 2021;Le Scao and Rush, 2021), and their zero-shot ability as well as transferability (Mishra et al, 2021a;Reynolds and McDonell, 2021). While discrete prompts have clear advantages, in addition to being human-readable and thus easily interpretable, we do not have efficient and algorithmic ways of reconstructing them.…”

Section: Related Workmentioning

confidence: 99%

“…For example, Shin et al (2020)'s algorithm discovers discrete prompts, alas the results are not human readable. Prior work also finds that model performance is highly sensitive to small changes in wordings (Mishra et al, 2021a), and optimization over the discrete space is non-trivial and often highly unstable. Our findings here about the disconnect between continuous prompts and their discrete interpretation provides another perspective on the difficulty of discovering discrete prompts via continuous optimizations algorithms that (directly or indirectly) leverage the continuous space (more discussion in §6).…”

Section: Related Workmentioning

confidence: 99%

“…Continuous differentiable optimization in search of discrete human-readable prompts can lead to degenerate solutions. Manuallywritten discrete prompts have many nice properties (Schick and Schütze, 2021;Le Scao and Rush, 2021;Mishra et al, 2021a), but we don't yet know how to automatically and efficiently find them. Is there a way to use continuous and differentiable optimization on top of auto-regressive LMs like GPT (Radford et al, 2019) to find human readable prompts?…”

Section: Implications Of Prompt Waywardnessmentioning

confidence: 99%

See 1 more Smart Citation

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

Khashabi¹,

Lyu²,

Min³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and their nearest neighbor discrete projections: We can find continuous prompts that solve a task while being projected to an arbitrary text (e.g., definition of a different or even a contradictory task), while being within a very small (2%) margin of the best continuous prompt of the same size for the task. We provide intuitions behind this odd and surprising behavior, as well as extensive empirical analyses quantifying the effect of various parameters. For instance, for larger model sizes we observe higher waywardness, i.e, we can find prompts that more closely map to any arbitrary text with a smaller drop in accuracy. These findings have important implications relating to the difficulty of faithfully interpreting continuous prompts and their generalization across models and tasks, providing guidance for future progress in prompting language models.1 Nearest-neighbor projection via dot product has been previously used to study properties of continuous word embeddings (Mikolov et al., 2013;Hashimoto et al., 2016) and is commonly performed in the final layer of modern generative LMs (Radford et al., 2019;Raffel et al., 2020).

show abstract

“…This hard limit in the number of tokens is problematic in tasks where examples include structured (e.g., MWoZ) or unstructured data (e.g., WoW, WiT, MSC) since each shot in the prompt requires thousands of tokens. To overcome this challenge, there are several possible alternatives to be explored: 1) improving the task description (Mishra et al, 2021;Reynolds and McDonell, 2021; rather than increasing the number of shots, 2) using prompt tuning (Li and Liang, 2021;Lester et al, 2021;Logan IV et al, 2021), where more shots would help since prompts are trained continuous embeddings, 3) using adapters tuning (Houlsby et al, 2019;, 4) converting the task into Table 13: Response to offensive language (YEA-SAYER (ELIZA) EFFECT test). The BST , DialoGPT , GPT-2 (Radford et al, 2019), and Kuki results are taken from .…”

Section: Limited Number Of Shotsmentioning

confidence: 99%

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Madotto,

Lin,

Winata

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning to converse using only a few examples is a great challenge in conversational AI. The current best conversational models, which are either good chit-chatters (e.g., Blender-Bot) or goal-oriented systems (e.g., MinTL), are language models (LMs) fine-tuned on large conversational datasets. Training these models is expensive, both in terms of computational resources and time, and it is hard to keep them up to date with new conversational skills.A simple yet unexplored solution is prompt-based few-shot learning (Brown et al., 2020) which does not require gradient-based fine-tuning but instead uses a few examples in the LM context as the only source of learning. In this paper, we explore prompt-based few-shot learning in dialogue tasks. We benchmark LMs of different sizes in nine response generation tasks, which include four knowledge-grounded tasks, a task-oriented generations task, three open-chat tasks, and controlled stylistic generation, and five conversational parsing tasks, which include dialogue state tracking, graph path generation, persona information extraction, document retrieval, and internet query generation. The current largest released LM (GPT-J-6B) using prompt-based few-shot learning, and thus requiring no training, achieves competitive performance to fully trained state-of-the-art models. Moreover, we propose a novel prompt-based few-shot classifier, that also does not require any fine-tuning, to select the most appropriate prompt given a dialogue history. Finally, by combining the power of prompt-based few-shot learning and a Skill Selector, we create an end-to-end chatbot named the Few-Shot Bot (FSB), which automatically selects the most appropriate conversational skill, queries different knowledge bases or the internet, and uses the retrieved knowledge to generate a human-like response, all using only few dialogue examples per skill.

show abstract

Reframing Instructional Prompts to GPTk's Language

Cited by 22 publications

References 18 publications

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Contact Info

Product

Resources

About