mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Xue, Linting; Constant, Noah; Roberts, Adam; Kale, Mihir; Al‐Rfou, Rami; Siddhant, Aditya; Barua, Aditya; Raffel, Colin

doi:10.18653/v1/2021.naacl-main.41

Cited by 500 publications

(281 citation statements)

References 28 publications

Supporting

Mentioning

210

Contrasting

Unclassified

Order By: Relevance

“…As a future direction to build empathetic chatbots, using text datasets is insufficient, and speech rhythm and facial expressions may be useful [86], [87]. Cross-lingual transfer learning: Very recently, crosslingual transfer learning achieved improved results among several languages, including Arabic, with the help of pretrained multilingual models such as Multi-BERT [81,88] and AraT5 [84]. Indeed, languages that share specific morphosyntactic features tend to benefit from transfer learning.…”

Section: Discussionmentioning

confidence: 99%

“…Nevertheless, recent solutions are based on classical approaches, which are mainly limited to machine translation and manual feature engineering [5]. Additionally, in the last few years, several mutilingual pretrained models have emerged, including mT5 [88], mBART [89], which help in building multilingual conversational system. However such systems need to multilingual dictionaries and datasets to be trained.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Recent Developments in Arabic Conversational AI: A Literature Review

Fuad

Al-Yahya

2022

IEEE Access

View full text Add to dashboard Cite

Conversational AI is one of the most active research areas in AI, and it has gained more attention from academia as well as industry. Given recent advancements in several conversational AI systems in addition to the availability of several datasets, the aim of this study is to explore the landscape of Arabic text-based conversational AI systems. In this work, we provide a thorough review of recent Arabic conversational AI systems. We group them into three categories based on their functionality: (1) questionanswering (QA) systems, (2) task-oriented dialogue systems (DS), and (3) chatbots. Furthermore, we describe the common datasets used in building and evaluating conversational AI systems in Arabic. Few surveys have targeted the conversational AI field for the Arabic language, and we aim to cover this gap with this study. Our contribution focuses on reviewing and analyzing the literature in the field and highlighting future research directions towards human-like conversational AI systems in Arabic.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Recent Developments in Arabic Conversational AI: A Literature Review

Fuad

Al-Yahya

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…All the models and proposals discussed in this section are intended for the English language, however, there are many other languages that deserve attention. Some efforts were done to consider other languages along with the English language by means of multilingual models such as mBART [9] or mT5 [10]. Although these efforts are very convenient and useful in many cases, the performance of the multilingual models is typically lower on languages that are underrepresented in the pretraining data or differ so much, in linguistic terms, from the most represented languages [13,14].…”

Section: Related Workmentioning

confidence: 99%

“…However, most of the models proposed in the literature, such as BART [6], PEGASUS [7], or T5 [8] are intended to the English language and are not directly applicable to other languages. Multilingual models such as mBART [9] or mT5 [10] were also studied in the literature to address that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially on languages that are underrepresented in the pretraining corpora, or differ so much in linguistic terms from the most represented languages [11][12][13][14] For minority languages like Catalan, the data resources available are much lower than other languages like English, Chinese, or Spanish. Additionally, the multilingual models typically either do not include data of minority languages, or if they do, its proportion in the pretraining sets is much lower than those of the majority languages.…”

Section: Introductionmentioning

confidence: 99%

“…An evaluation of the performance of the model on the summarization task and an evaluation of the degree of abstractivity of its generated summaries are presented. We compare the results of each NAS model with the results obtained by the summarization models based on well-known multilingual language models (mBART [9] and mT5 [10]) fine-tuned for the summarization task for each language using the DACSA corpus.…”

mentioning

confidence: 99%

See 1 more Smart Citation

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

et al. 2021

View full text Add to dashboard Cite

Most of the models proposed in the literature for abstractive summarization are generally suitable for the English language but not for other languages. Multilingual models were introduced to address that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially for minority languages like Catalan. In this paper, we present a monolingual model for abstractive summarization of textual content in the Catalan language. The model is a Transformer encoder-decoder which is pretrained and fine-tuned specifically for the Catalan language using a corpus of newspaper articles. In the pretraining phase, we introduced several self-supervised tasks to specialize the model on the summarization task and to increase the abstractivity of the generated summaries. To study the performance of our proposal in languages with higher resources than Catalan, we replicate the model and the experimentation for the Spanish language. The usual evaluation metrics, not only the most used ROUGE measure but also other more semantic ones such as BertScore, do not allow to correctly evaluate the abstractivity of the generated summaries. In this work, we also present a new metric, called content reordering, to evaluate one of the most common characteristics of abstractive summaries, the rearrangement of the original content. We carried out an exhaustive experimentation to compare the performance of the monolingual models proposed in this work with two of the most widely used multilingual models in text summarization, mBART and mT5. The experimentation results support the quality of our monolingual models, especially considering that the multilingual models were pretrained with many more resources than those used in our models. Likewise, it is shown that the pretraining tasks helped to increase the degree of abstractivity of the generated summaries. To our knowledge, this is the first work that explores a monolingual approach for abstractive summarization both in Catalan and Spanish.

show abstract

Improving ROUGE‐1 by 6%: A novel multilingual transformer for abstractive news summarization

Kumar,

Solanki

2024

Concurrency and Computation

View full text Add to dashboard Cite

SummaryNatural language processing (NLP) has undergone a significant transformation, evolving from manually crafted rules to powerful deep learning techniques such as transformers. These advancements have revolutionized various domains including summarization, question answering, and more. Statistical models like hidden Markov models (HMMs) and supervised learning have played crucial roles in laying the foundation for this progress. Recent breakthroughs in transfer learning and the emergence of large‐scale models like BERT and GPT have further pushed the boundaries of NLP research. However, news summarization remains a challenging task in NLP, often resulting in factual inaccuracies or the loss of the article's essence. In this study, we propose a novel approach to news summarization utilizing a fine‐tuned Transformer architecture pre‐trained on Google's mt‐small tokenizer. Our model demonstrates significant performance improvements over previous methods on the Inshorts English News dataset, achieving a 6% enhancement in the ROUGE‐1 score and reducing training loss by 50%. This breakthrough facilitates the generation of reliable and concise news summaries, thereby enhancing information accessibility and user experience. Additionally, we conduct a comprehensive evaluation of our model's performance using popular metrics such as ROUGE scores, with our proposed model achieving ROUGE‐1: 54.6130, ROUGE‐2: 31.1543, ROUGE‐L: 50.7709, and ROUGE‐LSum: 50.7907. Furthermore, we observe a substantial reduction in training and validation losses, underscoring the effectiveness of our proposed approach.

show abstract

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Cited by 500 publications

References 28 publications

Recent Developments in Arabic Conversational AI: A Literature Review

Recent Developments in Arabic Conversational AI: A Literature Review

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Improving ROUGE‐1 by 6%: A novel multilingual transformer for abstractive news summarization

Contact Info

Product

Resources

About