James Apfel scite author profile

End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinct language model (LM) component that characterizes traditional speech systems. While this simplifies the model architecture, it complicates the task of incorporating textonly data into training, which is important to the recognition of tail words that do not occur often in audio-text pairs. While shallow fusion has been proposed as a method for incorporating a pre-trained LM into an E2E model at inference time, it has not yet been explored for very large text corpora, and it has been shown to be very sensitive to hyperparameter settings in the beam search. In this work, we apply shallow fusion to incorporate a very large text corpus into a state-of-the-art E2E ASR model. We explore the impact of model size and show that intelligent pruning of the training set can be more effective than increasing the parameter count. Additionally, we show that incorporating the LM in minimum word error rate (MWER) fine tuning makes shallow fusion far less dependent on optimal hyperparameter settings, reducing the difficulty of that tuning problem.

show abstract

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

Peyser¹,

Sepand²,

Sainath³

et al. 2020

Preprint

View full text Add to dashboard Cite

Neural Oracle Search on N-BEST Hypotheses

Variani

Chen

Apfel

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

James Apfel

Rnn-Transducer with Stateless Prediction Network

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

Neural Oracle Search on N-BEST Hypotheses

Contact Info

Product

Resources

About