Multitask Prompted Training Enables Zero-Shot Task Generalization

Sanh, Victor; Webson, Albert; Raffel, Colin; Bach, Stephen H.; Sutawika, Lintang; Alyafeai, Zaid; Chaffin, Antoine; Stiegler, Arnaud; Scao, Teven Le; Raja, Arun; Dey, Manan; Saiful, Bari, M; Xu, Canwen; Thakker, Urmish; Sharma, Shanya; Szczechla, Eliza; Kim, Tae-Woon; Chhablani, Gunjan; Nayak, Nihal V.; Datta, Debajyoti; Chang, Jonathan; Jiang, Mike Tian-Jian; Wang, Han; Manica, Matteo; Shen, Sheng; Yong, Zheng; Pandey, Harshit; Bawden, Rachel; Wang, Thomas J.; Neeraj, Trishala; Rozen, Jos; Sharma, Abheesht; Santilli, Andrea; Févry, Thibault; Fries, Jason; Teehan, Ryan; Bers, Tali; Biderman, Stella; Gao, Leo; Wolf, Thomas; Rush, Alexander M.

doi:10.48550/arxiv.2110.08207

Cited by 86 publications

(149 citation statements)

References 22 publications

Supporting

Mentioning

129

Contrasting

Order By: Relevance

“…Large language models can generalize to these unseen instructions, obtaining reasonable performance in a wide variety of tasks. Moreover, recent works Sanh et al, 2021) have shown that we can improve the performance of this instructionfollowing behavior by fine-tuning on a multi-task mixture using natural language descriptions of the tasks, which mirrors closely the results from the multilingual MT literature.…”

Section: Introductionsupporting

confidence: 78%

See 1 more Smart Citation

Using natural language prompts for machine translation

García¹,

Fırat²

2022

Preprint

View full text Add to dashboard Cite

We explore the use of natural language prompts for controlling various aspects of the outputs generated by machine translation models. We demonstrate that natural language prompts allow us to influence properties like formality or specific dialect of the output. We show that using language names to control the output language of multilingual translation models enables positive transfer for unseen language pairs. This unlocks the ability to translate into languages not seen during finetuning by using their English names. We investigate how scale, number of pre-training steps, number of languages in fine-tuning, and language similarity affect this phenomenon.

show abstract

Section: Introductionsupporting

confidence: 78%

“…In this formulation, we could formulate our prompts as Translate to {language_name}: {input_slot}. and Sanh et al (2021) have shown this ability to follow natural language instructions can be improved by finetuning the model on a diverse mixture of tasks. All of these works have typically focused on large, English-centric language models.…”

Section: Related Workmentioning

confidence: 99%

Using natural language prompts for machine translation

García¹,

Fırat²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Motivated by this idea, we combine approaches from SOLOIST, MUPPET, and T0 for PrefineDST in an attempt to train a robust DST model through prefinetuning (Peng et al, 2020a;Aghajanyan et al, 2021;Sanh et al, 2021). We choose prefinetuning tasks based on their intuitive potential for improving on qualities measured by CheckDST and uniformly format these non-target datasets as text-totext generation tasks with the help of instruction prompts.…”

Section: Prefinedstmentioning

confidence: 99%

“…The most similar work to PrefineDST is MUPPET, a BART model prefinetuned on more than 50 heterogeneous tasks via additional layers that accommodate different task structures (Aghajanyan et al, 2021). We adapt the multitasking approach by Sanh et al (2021) and to remove the additional layers used by MUPPET.…”

Section: Related Workmentioning

confidence: 99%

“…Following Sanh et al (2021), we do not adjust the sampling rate based on each the sample size of each task that we multitask with during prefinetuning. Since all tasks are formatted as a text-to-text generation task, we do not need any additional layers as was needed for MUPPET nor form homogeneous batches that contain samples only from the same task.…”

Section: Appendixmentioning

confidence: 99%

See 1 more Smart Citation

CheckDST: Measuring Real-World Generalization of Dialogue State Tracking Performance

Hyundong¹,

Sankar²,

Lin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent neural models that extend the pretrainthen-finetune paradigm continue to achieve new state-of-the-art results on joint goal accuracy (JGA) for dialogue state tracking (DST) benchmarks. However, we call into question their robustness as they show sharp drops in JGA for conversations containing utterances or dialog flows with realistic perturbations. Inspired by CheckList (Ribeiro et al., 2020), we design a collection of metrics called CheckDST that facilitate comparisons of DST models on comprehensive dimensions of robustness by testing well-known weaknesses with augmented test sets. We evaluate recent DST models with CheckDST and argue that models should be assessed more holistically rather than pursuing state-of-the-art on JGA since a higher JGA does not guarantee better overall robustness. We find that span-based classification models are resilient to unseen named entities but not robust to language variety, whereas those based on autoregressive language models generalize better to language variety but tend to memorize named entities and often hallucinate. Due to their respective weaknesses, neither approach is yet suitable for real-world deployment. We believe CheckDST is a useful guide for future research to develop task-oriented dialogue models that embody the strengths of various methods.

show abstract

Key Problems and Solutions of the Application of Artificial Intelligence Technology

Zhou

2020

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities, making them highly successful in a variety of tasks. However, when used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4. As intelligent agents, LLMs need to have the capabilities of task planning, long-term memory, and the ability to leverage external tools to achieve satisfactory performance. Various methods have been proposed to enhance the agent capabilities of LLMs. On the one hand, methods involve constructing agent-specific data and fine-tuning the models. On the other hand, some methods focus on designing prompts that effectively activate the reasoning abilities of the LLMs. We explore both strategies on the 7B and 13B models. We propose a comprehensive method for constructing agent-specific data using GPT-4. Through supervised finetuning with constructed data, we find that for these models with a relatively small number of parameters, supervised fine-tuning can significantly reduce hallucination outputs and formatting errors in agent tasks. Furthermore, techniques such as multi-path reasoning and task decomposition can effectively decrease problem complexity and enhance the performance of LLMs as agents. We evaluate our method on five agent tasks of AgentBench and achieve satisfactory results.

show abstract

Multitask Prompted Training Enables Zero-Shot Task Generalization

Cited by 86 publications

References 22 publications

Using natural language prompts for machine translation

Using natural language prompts for machine translation

CheckDST: Measuring Real-World Generalization of Dialogue State Tracking Performance

Key Problems and Solutions of the Application of Artificial Intelligence Technology

Contact Info

Product

Resources

About