Petr Zadrazil scite author profile

Petr Zadrazil

5Publications

62Citation Statements Received

75Citation Statements Given

How they've been cited

How they cite others

Affiliations

Google (United States)

Publications

Order By: Most citations

Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities

Sim

Johnson

Motta

et al. 2019

View full text Add to dashboard Cite

We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acquisition performance. We evaluate the algorithms on a dataset that we designed to contain names of persons that are difficult to recognize. Therefore, the baseline recall rate for proper names in this dataset is very low: 2.4%. A data synthesis approach we developed brings it to 48.6%, with no need for speech input from the user. With speech input, if the user corrects only the names, the name recall rate improves to 64.4%. If the user corrects all the recognition errors, we achieve the best recall of 73.5%. To eliminate the need to upload user data and store personalized models on a server, we focus on performing the entire personalization workflow on a mobile device.

show abstract

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models

Sim¹,

Zadrazil²,

Beaufays³

2019

View full text Add to dashboard Cite

Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific user. In this paper, we investigate the idea of securely training personalized end-to-end speech recognition models on mobile devices so that user data and models never leave the device and are never stored on a server. We study how the mobile training environment impacts performance by simulating on-device data consumption. We conduct experiments using data collected from speech impaired users for personalization. Our results show that personalization achieved 63.7% relative word error rate reduction when trained in a server environment and 58.1% in a mobile environment. Moving to on-device personalization resulted in 18.7% performance degradation, in exchange for improved scalability and data privacy. To train the model on device, we split the gradient computation into two and achieved 45% memory reduction at the expense of 42% increase in training time.

show abstract

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

Sim¹,

Zadrazil²,

Beaufays³

2019

Preprint

View full text Add to dashboard Cite

Low-Rank Gradient Approximation for Memory-Efficient on-Device Training of Deep Neural Network

Gooneratne

Sim

Zadrazil

et al. 2020

View full text Add to dashboard Cite

Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. However, one of the major obstacles to achieving this goal is the memory limitation of mobile devices. Reducing training memory enables models with high-dimensional weight matrices, like automatic speech recognition (ASR) models, to be trained on-device. In this paper, we propose approximating the gradient matrices of deep neural networks using a low-rank parameterization as an avenue to save training memory. The low-rank gradient approximation enables more advanced, memory-intensive optimization techniques to be run on device. Our experimental results show that we can reduce the training memory by about 33.0% for Adam optimization. It uses comparable memory to momentum optimization and achieves a 4.5% relative lower word error rate on an ASR personalization task.

show abstract

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

Chai¹,

Francoise²,

Benard³

et al. 2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Petr Zadrazil

Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

Low-Rank Gradient Approximation for Memory-Efficient on-Device Training of Deep Neural Network

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

Contact Info

Product

Resources

About