Aditya Gourav scite author profile

Aditya Gourav

5Publications

10Citation Statements Received

123Citation Statements Given

How they've been cited

How they cite others

120

Affiliations

Amazon (United States)

Publications

Order By: Most citations

Personalization Strategies for End-to-End Speech Recognition Systems

Gourav

Liu

Gandhe

et al. 2021

View full text Add to dashboard Cite

The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first-and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with minimum degradation on the general use case. We describe a fast and scalable algorithm that enables our biasing models to remain at the wordlevel, while applying the biasing at the subword level. This has the advantage of not requiring the biasing models to be dependent on any subword symbol table. We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2.5%.

show abstract

Domain-Aware Neural Language Models for Speech Recognition

Liu

Gourav

et al. 2021

View full text Add to dashboard Cite

As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains. We present a domainaware rescoring framework suitable for achieving domainadaptation during second-pass rescoring in production settings. In our framework, we fine-tune a domain-general neural language model on several domains, and use an LSTMbased domain classification model to select the appropriate domain-adapted model to use for second-pass rescoring. This domain-aware rescoring improves the word error rate by up to 2.4% and slot word error rate by up to 4.1% on three individual domains -shopping, navigation, and music -compared to domain general rescoring. These improvements are obtained while maintaining accuracy for the general use case.

show abstract

Domain-aware Neural Language Models for Speech Recognition

Liu

Gourav

et al. 2021

Preprint

View full text Add to dashboard Cite

Personalization Strategies for End-to-End Speech Recognition Systems

Gourav¹,

Liu²,

Gandhe³

et al. 2021

Preprint

View full text Add to dashboard Cite

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Kolehmainen¹,

Gu²,

Gourav³

et al. 2023

View full text Add to dashboard Cite

Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a crossattention based encoder-decoder model. We use internal deidentified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aditya Gourav

Personalization Strategies for End-to-End Speech Recognition Systems

Domain-Aware Neural Language Models for Speech Recognition

Domain-aware Neural Language Models for Speech Recognition

Personalization Strategies for End-to-End Speech Recognition Systems

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Contact Info

Product

Resources

About