Albert Webson scite author profile

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language model training . Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16× its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models up to 6× its size. All prompts and trained models are available at github.com/bigscience-workshop/promptsource/ and huggingface.co/bigscience/T0pp.

show abstract

Scaling Instruction-Finetuned Language Models

Chung¹,

Hou²,

Longpre³

et al. 2022

Preprint

101

124

View full text Add to dashboard Cite

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Scao¹,

Fan²,

Akiki³

et al. 2022

Preprint

View full text Add to dashboard Cite

Do Prompt-Based Models Really Understand the Meaning of their Prompts?

Webson¹,

Pavlick²

2021

Preprint

View full text Add to dashboard Cite

Recently, a boom of papers have shown extraordinary progress in few-shot learning with various prompt-based models. Such success can give the impression that prompts help models to learn faster in the same way that humans learn faster when provided with task instructions expressed in natural language. In this study, we experiment with over 30 prompts manually written for natural language inference (NLI). We find that models learn just as fast with many prompts that are intentionally irrelevant or even pathologically misleading as they do with instructively "good" prompts. Additionally, we find that model performance is more dependent on the choice of the LM target words (a.k.a. the "verbalizer" that converts LM vocabulary predictions to class labels) than on the text of the prompt itself. In sum, we find little evidence that suggests existing promptbased models truly understand the meaning of their given prompts. IntroductionSuppose a human is given two sentences: "No weapons of mass destruction found in Iraq yet." and "Weapons of mass destruction found in Iraq." They are then asked to respond 0 or 1 and receive a reward if they are correct. In this setup, they would likely need a large number of trials and errors before figuring out what they are really being rewarded to do. This setup is akin to the pretrainand-fine-tune setup which has dominated NLP in recent years, in which models are asked to classify a sentence representation (e.g., a CLS token) into some arbitrary dimensions of a one-hot vector. In contrast, suppose a human is given a prompt such as Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that "no weapons of mass destruction found in Iraq yet.", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that "weapons of mass destruction found in Iraq."? "? "? "? "? "? "? "? "? "? "? "? "? "? "? "? "? 1 Then it would be no surprise that they are able to perform the task more accurately and without needing many examples to figure out what the task is.

show abstract

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Bach¹,

Sanh²,

Yong³

et al. 2022

View full text Add to dashboard Cite

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation With Large Language Models

Strobelt¹,

Webson

Sanh³

et al. 2022

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Longpre¹,

Hou²,

Vu³

et al. 2023

Preprint

View full text Add to dashboard Cite

We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 . Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks-motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available. 1

show abstract

Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

Webson¹,

Pavlick

2022

View full text Add to dashboard Cite

Recently, a boom of papers has shown extraordinary progress in zero-shot and few-shot learning with various prompt-based models. It is commonly argued that prompts help models to learn faster in the same way that humans learn faster when provided with task instructions expressed in natural language. In this study, we experiment with over 30 prompt templates manually written for natural language inference (NLI). We find that models can learn just as fast with many prompts that are intentionally irrelevant or even pathologically misleading as they do with instructively "good" prompts. Further, such patterns hold even for models as large as 175 billion parameters (Brown et al., 2020) as well as the recently proposed instruction-tuned models which are trained on hundreds of prompts (Sanh et al., 2021). That is, instruction-tuned models often produce good predictions with irrelevant and misleading prompts even at zero shots. In sum, notwithstanding prompt-based models' impressive improvement, we find evidence of serious limitations that question the degree to which such improvement is derived from models understanding task instructions in ways analogous to humans' use of task instructions. * Unabridged version available on arXiv. Code, interactive figures, and statistical test results available at https://github. com/awebson/prompt_semantics arbitrary dimensions of a one-hot vector. In contrast, suppose a human is given a prompt such as: Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that " Given that "no weapons of mass destruction found in Iraq yet.", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that " ", is it definitely correct that "weapons of mass destruction found in Iraq."? "? "? "? "? "? "? "? "? "? "? "? "? "? "? "? "? 1 Then it would be no surprise that they are able to perform the task more accurately and without needing many examples to figure out what the task is.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Albert Webson

Multitask Prompted Training Enables Zero-Shot Task Generalization

Scaling Instruction-Finetuned Language Models

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Do Prompt-Based Models Really Understand the Meaning of their Prompts?

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation With Large Language Models

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

Contact Info

Product

Resources

About