Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.41
|View full text |Cite
|
Sign up to set email alerts
|

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Abstract: The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accide… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
210
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 500 publications
(281 citation statements)
references
References 28 publications
0
210
0
1
Order By: Relevance
“…As a future direction to build empathetic chatbots, using text datasets is insufficient, and speech rhythm and facial expressions may be useful [86], [87]. Cross-lingual transfer learning: Very recently, crosslingual transfer learning achieved improved results among several languages, including Arabic, with the help of pretrained multilingual models such as Multi-BERT [81,88] and AraT5 [84]. Indeed, languages that share specific morphosyntactic features tend to benefit from transfer learning.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…As a future direction to build empathetic chatbots, using text datasets is insufficient, and speech rhythm and facial expressions may be useful [86], [87]. Cross-lingual transfer learning: Very recently, crosslingual transfer learning achieved improved results among several languages, including Arabic, with the help of pretrained multilingual models such as Multi-BERT [81,88] and AraT5 [84]. Indeed, languages that share specific morphosyntactic features tend to benefit from transfer learning.…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, recent solutions are based on classical approaches, which are mainly limited to machine translation and manual feature engineering [5]. Additionally, in the last few years, several mutilingual pretrained models have emerged, including mT5 [88], mBART [89], which help in building multilingual conversational system. However such systems need to multilingual dictionaries and datasets to be trained.…”
Section: Discussionmentioning
confidence: 99%
“…All the models and proposals discussed in this section are intended for the English language, however, there are many other languages that deserve attention. Some efforts were done to consider other languages along with the English language by means of multilingual models such as mBART [9] or mT5 [10]. Although these efforts are very convenient and useful in many cases, the performance of the multilingual models is typically lower on languages that are underrepresented in the pretraining data or differ so much, in linguistic terms, from the most represented languages [13,14].…”
Section: Related Workmentioning
confidence: 99%
“…However, most of the models proposed in the literature, such as BART [6], PEGASUS [7], or T5 [8] are intended to the English language and are not directly applicable to other languages. Multilingual models such as mBART [9] or mT5 [10] were also studied in the literature to address that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially on languages that are underrepresented in the pretraining corpora, or differ so much in linguistic terms from the most represented languages [11][12][13][14] For minority languages like Catalan, the data resources available are much lower than other languages like English, Chinese, or Spanish. Additionally, the multilingual models typically either do not include data of minority languages, or if they do, its proportion in the pretraining sets is much lower than those of the majority languages.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation