2019
DOI: 10.48550/arxiv.1910.10683
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

14
1,345
0
6

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 854 publications
(1,365 citation statements)
references
References 0 publications
14
1,345
0
6
Order By: Relevance
“…However, the success of the above DL-based methods rely heavily on large-scale datasets, posing a challenge for supervised and cross-domain text generation tasks. Since 2018, large-scale pretrained Language models (PLMs) such as BERT [Devlin et al 2018], RoBERTa , GPT , T5 [Raffel et al 2019] and mBART [Liu et al 2020a], have gradually become a new paradigm of NLP. Owing to its use of large corpus and unsupervised learning based on the Transformer structure, PLMs are believed to have learned a great deal of semantic and syntactical knowledge from the data, and only a fine-tuning is required for downstream tasks to get the state-of-the-art (SOTA) performance.…”
Section: Ai Chatbot Story Generationmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the success of the above DL-based methods rely heavily on large-scale datasets, posing a challenge for supervised and cross-domain text generation tasks. Since 2018, large-scale pretrained Language models (PLMs) such as BERT [Devlin et al 2018], RoBERTa , GPT , T5 [Raffel et al 2019] and mBART [Liu et al 2020a], have gradually become a new paradigm of NLP. Owing to its use of large corpus and unsupervised learning based on the Transformer structure, PLMs are believed to have learned a great deal of semantic and syntactical knowledge from the data, and only a fine-tuning is required for downstream tasks to get the state-of-the-art (SOTA) performance.…”
Section: Ai Chatbot Story Generationmentioning
confidence: 99%
“…Seq2seq Models: The seq2seq models use both encoder and decoder of the Transformer, for a better model flexibility. Currently, the most representative models of this type include T5 [Raffel et al 2019] and mBART [Liu et al 2020a]. In principle almost all pre-trained tasks used in AE and AR models can be adapted to the seq2seq models.…”
Section: Output: Story Paragraphmentioning
confidence: 99%
See 1 more Smart Citation
“…Gradually, neural models equipped with copy mechanism are replaced by pretrained models, such as PEGASUS [59] for abstractive summarization, as well as MASS [60] and BART [61] for the general sequence-to-sequence tasks. Based on Transformer and transfer learning, universal models represented by T5 [62] are proposed, which are intended to solve most common NLP tasks at once. As the reflection of text summarization, SUMMEVAL [63] intends to resolve critical shortcomings in evaluation methods.…”
Section: Related Workmentioning
confidence: 99%
“…Second, although most previous language QA models follow a span-based answer prediction paradigm [28,43,59,61], it is impractical in our opendomain setting since there is no ground-truth supporting fact in our task, let alone the ground-truth answer span for prediction. On the other hand, recent work shows that a generative encoder-decoder network can achieve state-of-theart performance on multiple open-domain QA datasets [41], and it avoids span prediction and directly generates a freeform answer.…”
Section: Generative Multi-passages Qamentioning
confidence: 99%