2020
DOI: 10.48550/arxiv.2010.11934
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

mT5: A massively multilingual pre-trained text-to-text transformer

Abstract: The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its stateof-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
93
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 97 publications
(103 citation statements)
references
References 36 publications
(38 reference statements)
0
93
0
2
Order By: Relevance
“…For example, T5 (Raffel et al, 2019) demonstrated that many language tasks previously addressed with separate models could be addressed using a single text-to-text encoder-decoder Transformer model. Extending this approach, mT5 (Xue et al, 2020) used a single Transformer to model multiple languages, demonstrating that a unified architecture could also serve as a general multilingual model, leveraging high-resource language datasets to improve model performance on lower-resource datasets.…”
Section: Transformers For Sequence Modelingmentioning
confidence: 99%
See 3 more Smart Citations
“…For example, T5 (Raffel et al, 2019) demonstrated that many language tasks previously addressed with separate models could be addressed using a single text-to-text encoder-decoder Transformer model. Extending this approach, mT5 (Xue et al, 2020) used a single Transformer to model multiple languages, demonstrating that a unified architecture could also serve as a general multilingual model, leveraging high-resource language datasets to improve model performance on lower-resource datasets.…”
Section: Transformers For Sequence Modelingmentioning
confidence: 99%
“…In addition to removing the cumbersome task of constructing specialized architectures and loss functions for different instrumentations and datasets, our general output vocabulary also allows our model to be trained on a mixture of several datasets simultaneously, similar to how multilingual translation models such as mT5 are trained on several languages (Xue et al, 2020). This approach not only simplifies model design and training, but also increases the amount and diversity of training data available to the model.…”
Section: Multi-task Mixturementioning
confidence: 99%
See 2 more Smart Citations
“…We decode the output by searching for occurrences of the predicted acronyms and long-forms and detecting their character spans in the input text. We use mT5 for our experiments (Xue et al 2021).…”
Section: Baselinesmentioning
confidence: 99%