Petar S. Aleksic scite author profile

Recent work has shown that end-to-end (E2E) speech recognition architectures such as Listen Attend and Spell (LAS) can achieve state-of-the-art quality results in LVCSR tasks. One benefit of this architecture is that it does not require a separately trained pronunciation model, language model, and acoustic model. However, this property also introduces a drawback: it is not possible to adjust language model contributions separately from the system as a whole. As a result, inclusion of dynamic, contextual information (such as nearby restaurants or upcoming events) into recognition requires a different approach from what has been applied in conventional systems. We introduce a technique to adapt the inference process to take advantage of contextual signals by adjusting the output likelihoods of the neural network at each step in the beam search. We apply the proposed method to a LAS E2E model and show its effectiveness in experiments on a voice search task with both artificial and real contextual information. Given optimal context, our system reduces WER from 9.2% to 3.8%. The results show that this technique is effective at incorporating context into the prediction of an E2E system.

show abstract

Audio-Visual Biometrics

Aleksic

Katsaggelos

2006

Proc. IEEE

101

View full text Add to dashboard Cite

Improved recognition of contact names in voice commands

Aleksic

Allauzen

Elson

et al. 2015

View full text Add to dashboard Cite

Bringing contextual information to google speech recognition

Aleksic

Ghodsi

Michaely

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Petar S. Aleksic

Automatic Facial Expression Recognition Using Facial Animation Parameters and Multistream HMMs

Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search

Audio-Visual Biometrics

Improved recognition of contact names in voice commands

Bringing contextual information to google speech recognition

Contact Info

Product

Resources

About