Aditya Siddhant scite author profile

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available. 1

show abstract

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Siddhant¹,

Lipton²

2018

126

119

View full text Add to dashboard Cite

Several recent papers investigate Active Learning (AL) for mitigating the datadependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury.Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a largescale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by-Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.

show abstract

mT5: A massively multilingual pre-trained text-to-text transformer

Xue

Constant

Roberts

et al. 2020

Preprint

108

View full text Add to dashboard Cite

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its stateof-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available. 1

show abstract

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Siddhant¹,

Bapna²,

Cao³

et al. 2020

View full text Add to dashboard Cite

Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with selfsupervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of lowresource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 33 BLEU on WMT ro-en translation without any parallel data or back-translation.

show abstract

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Ruder¹,

Constant²,

Botha³

et al. 2021

View full text Add to dashboard Cite

Machine learning has brought striking advances in multilingual natural language processing capabilities over the past year. For example, the latest techniques have improved the state-of-the-art performance on the XTREME multilingual benchmark by more than 13 points. While a sizeable gap to humanlevel performance remains, improvements have been easier to achieve in some tasks than in others. This paper analyzes the current state of cross-lingual transfer learning and summarizes some lessons learned. In order to catalyze meaningful progress, we extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks, including challenging language-agnostic retrieval tasks, and covers 50 typologically diverse languages. In addition, we provide a massively multilingual diagnostic suite (MULTICHECKLIST) and finegrained multi-dataset evaluation capabilities through an interactive public leaderboard to gain a better understanding of such models. 2020. XTREME: A Massively Multilingual Multitask Benchmark for Evaluating Cross-lingual Generalization. In Proceedings of ICML 2020.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aditya Siddhant

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

mT5: A massively multilingual pre-trained text-to-text transformer

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Contact Info

Product

Resources

About