Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Mishra, Swaroop; Khashabi, Daniel; Baral, Chitta; Hajishirzi, Hannaneh

doi:10.18653/v1/2022.acl-long.244

Cited by 86 publications

(109 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fine-tuned LMs for Instruction Learning Recent work shows that fine-tuning LMs to learn task instructions on a wide variety of tasks can further leverage the inductive bias of LMs to perform instruction learning (Zhong et al, 2021;Mishra et al, 2021;Wei et al, 2021). Our work is partially inspired by this line of work, but we work under the more generic few-shot meta-learning setting, and show that our approach out-performs both instruction tuning and existing few-shot meta-learning methods (e.g., MAML).…”

Section: Related Workmentioning

confidence: 99%

Meta-learning via Language Model In-context Tuning

Chen¹,

Zhong²,

Zha³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

The goal of meta-learning is to learn to adapt to a new task with only a few labeled examples. Inspired by the recent progress in large language models, we propose in-context tuning (ICT), which recasts task adaptation and prediction as a simple sequence prediction problem: to form the input sequence, we concatenate the task instruction, labeled in-context examples, and the target input to predict; to metatrain the model to learn from in-context examples, we fine-tune a pre-trained language model (LM) to predict the target label given the input sequence on a collection of tasks.We benchmark our method on two collections of text classification tasks: LAMA and Bina-ryClfs. Compared to MAML which adapts the model through gradient descent, our method leverages the inductive bias of pre-trained LMs to perform pattern matching, and outperforms MAML by an absolute 6% average AUC-ROC score on BinaryClfs, gaining more advantage with increasing model size. Compared to non-fine-tuned in-context learning (i.e. prompting a raw LM), in-context tuning meta-trains the model to learn from in-context examples. On BinaryClfs, ICT improves the average AUC-ROC score by an absolute 10%, and reduces the variance due to example ordering by 6x and example choices by 2x.

show abstract

Section: Related Workmentioning

confidence: 99%

Meta-learning via Language Model In-context Tuning

Chen¹,

Zhong²,

Zha³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…The Instruction Paradigm Efrat and Levy [2020] propose to learn new tasks from natural language instructions. Mishra et al [2022] and collect crowdsourcing instructions used to create NLP datasets into a benchmark for measuring the ability to solve tasks by reading instructions.…”

Section: Related Workmentioning

confidence: 99%

Instruction Induction: From Few Examples to Natural Language Task Descriptions

Honovich¹,

Shaham²,

Bowman³

et al. 2022

Preprint

View full text Add to dashboard Cite

Large language models are able to perform a task by conditioning on a few inputoutput demonstrations -a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. To explore this ability, we introduce the instruction induction challenge, compile a dataset consisting of 24 tasks, and define a novel evaluation metric based on executing the generated instruction. We discover that, to a large extent, the ability to generate instructions does indeed emerge when using a model that is both large enough and aligned to follow instructions; InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance. This surprising result suggests that instruction induction might be a viable learning paradigm in and of itself, where instead of fitting a set of latent continuous parameters to the data, one searches for the best description in the natural language hypothesis space. 1 We examine instruction induction on 24 tasks, ranging from morphosyntactic tasks (e.g., pluralization) to style transfer (e.g., formality) and sentiment analysis. As a basic evaluation protocol, we collect human annotations and use them as gold-standard references; the generated instructions are then compared to these references using BERTScore [Zhang et al., 2020]. Moreover, we suggest a novel evaluation metric for instruction induction: execution accuracy. The execution accuracy of a generated instruction is measured by testing whether LMs can correctly perform the task in a zero-shot manner by using the generated instruction alone, without any demonstrations.1 Our code and data are publicly available at https://github.com/orhonovich/instruction-induction Preprint. Under review.

show abstract

“…MTL has also contributed to sub-fields of text generation, such as open-ended dialogue system [160,9], task-oriented dialogue system [131], text style transfer [16], and question answering [63]. At the same time, researchers explore the transferability of models trained on multi-task datasets [99]. FLAN [145], T0 [125], and ZeroPrompt [153] investigate the zero-shot generalization abilities of large PLMs trained on numerous datasets with well-designed prompts.…”

Section: Related Workmentioning

confidence: 99%

“…GPT-3 [14] further combines several demonstrations to input to learn task patterns, which is called in-context learning. Some researchers also design elaborate prompts or demonstrations for each task and dataset and investigate their effectiveness and robustness [145,125,153,99]. Even so, whether models can truly understand the semantic meanings of prompts looks worthy of further investigation [144].…”

Section: Related Workmentioning

confidence: 99%

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Tang¹,

Li²,

Zhao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pre-trained language models (PLMs) have achieved notable success in natural language generation (NLG) tasks. Up to now, most of the PLMs are pre-trained in an unsupervised manner using large-scale general corpus. In the meanwhile, an increasing number of models pre-trained with less labeled data showcase superior performance compared to unsupervised models. Motivated by the success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation. For pre-training the text generation model MVP, we collect a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, we further pre-train specific soft prompts to stimulate the model capacity in performing a specific task. Extensive experiments have demonstrated the effectiveness of our supervised pre-training in a number of NLG tasks, and our general methods achieve state-of-the-art performance on 12 of 17 datasets.

show abstract

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Cited by 86 publications

References 33 publications

Meta-learning via Language Model In-context Tuning

Meta-learning via Language Model In-context Tuning

Instruction Induction: From Few Examples to Natural Language Task Descriptions

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Contact Info

Product

Resources

About