Paden Tomasello scite author profile

Paden Tomasello

5Publications

190Citation Statements Received

81Citation Statements Given

How they've been cited

236

187

How they cite others

Affiliations

Meta (Israel)

Publications

Order By: Most citations

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

Pratap¹,

Sriram²,

Tomasello³

et al. 2020

View full text Add to dashboard Cite

We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and overall simplifying deployment of ASR systems that support diverse languages. We perform an extensive benchmark on 51 languages, with varying amount of training data by language (from 100 hours to 1100 hours). We compare three variants of multilingual training from a single joint model without knowing the input language, to using this information, to multiple heads (one per language "cluster"). We show that multilingual training of ASR models on several languages can improve recognition performance, in particular, on low resource languages. We see 20.9%, 23% and 28.8% average WER relative reduction compared to monolingual baselines on joint model, joint model with language input and multi head model respectively. To our knowledge, this is the first work studying multi-lingual ASR at massive scale, with more than 50 languages and more than 16,000 hours of audio across them.

show abstract

Self-Training and Pre-Training are Complementary for Speech Recognition

Baevski

Likhomanenko

et al. 2021

101

View full text Add to dashboard Cite

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data. However, it is not clear whether they learn similar patterns or if they can be effectively combined. In this paper, we show that pseudo-labeling and pre-training with wav2vec 2.0 are complementary in a variety of labeled data setups. Using just 10 minutes of labeled data from Librilight as well as 53k hours of unlabeled data from LibriVox achieves word error rates (WER) of 2.8%/4.8% on the clean and other test sets of Librispeech -rivaling the best published systems trained on 960 hours of labeled data only a year ago. Training on all labeled data of Librispeech achieves WERs of 1.5%/3.1%.

show abstract

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

Pratap¹,

Sriram²,

Tomasello³

et al. 2020

Preprint

View full text Add to dashboard Cite

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Likhomanenko¹,

Xu²,

Pratap³

et al. 2020

Preprint

View full text Add to dashboard Cite

Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets -in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world data. Finally, we show that training a single acoustic model on the most widely-used datasets -combined -reaches competitive performance on both research and real-world benchmarks.

show abstract

Self-training and Pre-training are Complementary for Speech Recognition

Baevski

Likhomanenko

et al. 2020

Preprint

View full text Add to dashboard Cite

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data. However, it is not clear whether they learn similar patterns or if they can be effectively combined. In this paper, we show that pseudo-labeling and pre-training with wav2vec 2.0 are complementary in a variety of labeled data setups. Using just 10 minutes of labeled data from Libri-light as well as 53k hours of unlabeled data from LibriVox achieves WERs of 3.0%/5.2% on the clean and other test sets of Librispeechrivaling the best published systems trained on 960 hours of labeled data only a year ago. Training on all labeled data of Librispeech achieves WERs of 1.5%/3.1%. * Equal contribution.Preprint. Under review.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paden Tomasello

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

Self-Training and Pre-Training are Complementary for Speech Recognition

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Self-training and Pre-training are Complementary for Speech Recognition

Contact Info

Product

Resources

About