Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J.

doi:10.48550/arxiv.1910.10683

Cited by 854 publications

(1,365 citation statements)

References 0 publications

Supporting

Mentioning

1,345

Contrasting

Unclassified

Order By: Relevance

“…However, the success of the above DL-based methods rely heavily on large-scale datasets, posing a challenge for supervised and cross-domain text generation tasks. Since 2018, large-scale pretrained Language models (PLMs) such as BERT [Devlin et al 2018], RoBERTa , GPT , T5 [Raffel et al 2019] and mBART [Liu et al 2020a], have gradually become a new paradigm of NLP. Owing to its use of large corpus and unsupervised learning based on the Transformer structure, PLMs are believed to have learned a great deal of semantic and syntactical knowledge from the data, and only a fine-tuning is required for downstream tasks to get the state-of-the-art (SOTA) performance.…”

Section: Ai Chatbot Story Generationmentioning

confidence: 99%

“…Seq2seq Models: The seq2seq models use both encoder and decoder of the Transformer, for a better model flexibility. Currently, the most representative models of this type include T5 [Raffel et al 2019] and mBART [Liu et al 2020a]. In principle almost all pre-trained tasks used in AE and AR models can be adapted to the seq2seq models.…”

Section: Output: Story Paragraphmentioning

confidence: 99%

“…In principle almost all pre-trained tasks used in AE and AR models can be adapted to the seq2seq models. Relevant research [Raffel et al 2019] has found that seq2seq models can achieve a better performance. Moreover, a seq2seq model unifies the NLU and NLG tasks so that they can be solved under the same framework.…”

Section: Output: Story Paragraphmentioning

confidence: 99%

See 2 more Smart Citations

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Zhang¹,

Song²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.

show abstract

Section: Ai Chatbot Story Generationmentioning

confidence: 99%

Section: Output: Story Paragraphmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Zhang¹,

Song²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Gradually, neural models equipped with copy mechanism are replaced by pretrained models, such as PEGASUS [59] for abstractive summarization, as well as MASS [60] and BART [61] for the general sequence-to-sequence tasks. Based on Transformer and transfer learning, universal models represented by T5 [62] are proposed, which are intended to solve most common NLP tasks at once. As the reflection of text summarization, SUMMEVAL [63] intends to resolve critical shortcomings in evaluation methods.…”

Section: Related Workmentioning

confidence: 99%

Assemble Foundation Models for Automatic Code Summarization

Gu¹,

Salza²,

Gall³

2022

Preprint

View full text Add to dashboard Cite

Automatic code summarization is beneficial to software development and maintenance since it reduces the burden of manual tasks. Currently, artificial intelligence is undergoing a paradigm shift. The foundation models pretrained on massive data and finetuned to downstream tasks surpass specially customized models. This trend inspired us to consider reusing foundation models instead of learning from scratch. Based on this, we propose a flexible and robust approach for automatic code summarization based on neural networks. We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo. Moreover, we utilize Gaussian noise as the simulation of contextual information to optimize the latent representation. Furthermore, we introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning, and design intermediate stage tasks for general sequence-to-sequence learning. Finally, we evaluate AdaMo against a benchmark dataset for code summarization, by comparing it with state-of-the-art models.

show abstract

“…Second, although most previous language QA models follow a span-based answer prediction paradigm [28,43,59,61], it is impractical in our opendomain setting since there is no ground-truth supporting fact in our task, let alone the ground-truth answer span for prediction. On the other hand, recent work shows that a generative encoder-decoder network can achieve state-of-theart performance on multiple open-domain QA datasets [41], and it avoids span prediction and directly generates a freeform answer.…”

Section: Generative Multi-passages Qamentioning

confidence: 99%

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering

Gao¹,

Peng²,

Thattai³

et al. 2022

Preprint

View full text Add to dashboard Cite

Outside-knowledge visual question answering (OK-VQA) requires the agent to comprehend the image, make use of relevant knowledge from the entire web, and digest all the information to answer the question. Most previous works address the problem by first fusing the image and question in the multi-modal space, which is inflexible for further fusion with a vast amount of external knowledge. In this paper, we call for a paradigm shift for the OK-VQA task, which transforms the image into plain text, so that we can enable knowledge passage retrieval, and generative question-answering in the natural language space. This paradigm takes advantage of the sheer volume of gigantic knowledge bases and the richness of pretrained language models. A Transform-Retrieve-Generate framework (TRiG) framework is proposed 1 , which can be plug-and-played with alternative image-to-text models and textual knowledge bases. Experimental results show that our TRiG framework outperforms all state-of-the-art supervised methods by at least 11.1% absolute margin.

show abstract

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Cited by 854 publications

References 0 publications

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Assemble Foundation Models for Automatic Code Summarization

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering

Contact Info

Product

Resources

About