Alexander M. Rush scite author profile

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language model training . Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16× its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models up to 6× its size. All prompts and trained models are available at github.com/bigscience-workshop/promptsource/ and huggingface.co/bigscience/T0pp.

show abstract

A Neural Attention Model for Abstractive Sentence Summarization

Rush¹,

Chopra²,

Weston³

2015

Preprint

106

120

View full text Add to dashboard Cite

Sequence-Level Knowledge Distillation

Kim

Rush

2016

Preprint

119

109

View full text Add to dashboard Cite

OpenNMT: Open-Source Toolkit for Neural Machine Translation

Klein¹,

Kim²,

Deng³

et al. 2017

Preprint

106

101

View full text Add to dashboard Cite

Character-Aware Neural Language Models

Kim

Jernite

Sontag

et al. 2016

AAAI

426

View full text Add to dashboard Cite

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway net work over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alexander M. Rush

Multitask Prompted Training Enables Zero-Shot Task Generalization

A Neural Attention Model for Abstractive Sentence Summarization

Sequence-Level Knowledge Distillation

OpenNMT: Open-Source Toolkit for Neural Machine Translation

Character-Aware Neural Language Models

Contact Info

Product

Resources

About