Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Wang, Yizhong; Mishra, Swaroop; Alipoormolabashi, Pegah; Kordi, Yeganeh; Amirreza, Mirzaei,; Arunkumar, Anjana; Ashok, Arjun; Dhanasekaran, Arut Selvan; Naik, Ajit Kumar; David, Stap,; Pathak, Eshaan; Karamanolakis, Giannis; Lai, Haizhi Gary; Purohit, Ishan; Mondal, Ishani; Anderson, J.; Kirby, Kuznia,; Doshi, Krima; Patel, Maitreya; Pal, Kuntal Kumar; Moradshahi, Mehrad; Parmar, Mihir; Purohit, Mirali; Varshney, Neeraj; Kaza, Phani Rohitha; Verma, Pulkit; Singh, Puri, Ravsehaj; Karia, Rushang; Sampat, Shailaja Keyur; Doshi, Savan; Mishra, Siddhartha; Reddy, Sujan; Patro, Sumanta; Tanay, Dixit,; Shen, Xudong; Baral, Chitta; Choi, Yejin; Hajishirzi, Hannaneh; Smith, Noah A.; Khashabi, Daniel

doi:10.48550/arxiv.2204.07705

Cited by 6 publications

(17 citation statements)

References 12 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In preliminary experiments, we found that T0 was not able to perform few-shot in-context learning -performance actually decreased as we increased the number of in-context examples. This is likely because of the zero-shot format used during multitask prompted fine-tuning and corroborates a recent finding by [10].…”

Section: Performance On T0 Askssupporting

confidence: 88%

See 1 more Smart Citation

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Liu¹,

Tam²,

Muqeeth³

et al. 2022

Preprint

View full text Add to dashboard Cite

Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model [1] called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark [2], attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available. 1 * Equal contribution. 1 https://github.com/r-three/t-few Preprint. Under review.

show abstract

Section: Performance On T0 Askssupporting

confidence: 88%

“…Performing ICL therefore solely relies on the capabilities that a model learned during pre-training. These characteristics have led to a great deal of recent interest in ICL methods [5][6][7][8][9][10].…”

Section: Introductionmentioning

confidence: 99%

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Liu¹,

Tam²,

Muqeeth³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The Flan 2022 Collection offers the most extensive publicly available set of tasks and methods for instruction tuning, which we have compiled in one place, and supplemented with hundreds more high-quality templates and richer formatting patterns. We show that a model trained on this collection outperforms other public collections on all tested evaluation benchmarks, including the original Flan 2021 (Wei et al, 2021), T0++ (Sanh et al, 2021), Super-Natural Instructions (Wang et al, 2022c), and the concurrent work on OPT-IML (Iyer et al, 2022). As shown in Figure 1, this includes a 4.2%+ and 8.5% improvements on the MMLU (Hendrycks et al, 2020) and BIG-Bench Hard (Suzgun et al, 2022) evaluation benchmarks, for equally sized models.…”

Section: Introductionmentioning

confidence: 82%

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Longpre¹,

Hou²,

Vu³

et al. 2023

Preprint

View full text Add to dashboard Cite

We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 . Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks-motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available. 1

show abstract

“…To facilitate the same interface for various customized visual tasks in the wild, it is desirable to have the same uniform task instruction schema. In NLP, all task instructions can follow the same uniform schema, composed of task definition and positive/negative examples [50,70]. Here, the task definition defines a given task in natural language, completely specifying how an input is expected to be mapped to an output text.…”

Section: Retrieval-augmented Task Instructionmentioning

confidence: 99%

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Liu¹,

Son²,

Yang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs (∼3% of CLIP pre-training data) from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings. Particularly, on the zero-shot classification task, compared with CLIP, it achieves up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark (20 datasets).

show abstract

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Cited by 6 publications

References 12 publications

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Contact Info

Product

Resources

About