Jay Mahadeokar scite author profile

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We demonstrate that on the widely used Librispeech benchmark, our transformer-based AM outperforms the best published hybrid result by 19% to 26% relative when the standard n-gram language model (LM) is used. Combined with neural network LM for rescoring, our proposed approach achieves state-of-the-art results on Librispeech. Our findings are also confirmed on a much larger internal dataset.

show abstract

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

Yeh¹,

Mahadeokar²,

Kalgaonkar³

et al. 2019

Preprint

View full text Add to dashboard Cite

Alignment Restricted Streaming Recurrent Neural Network Transducer

Mahadeokar

Shangguan

et al. 2021

View full text Add to dashboard Cite

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

Le¹,

Jain²,

Keren³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep Shallow Fusion for RNN-T Personalization

Keren

Chan

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jay Mahadeokar

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

Alignment Restricted Streaming Recurrent Neural Network Transducer

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

Deep Shallow Fusion for RNN-T Personalization

Contact Info

Product

Resources

About