Johan Schalkwyk scite author profile

An important goal at Google is to make spoken access ubiquitously available. Achieving ubiquity requires two things: availability (i.e., built into every possible interaction where speech input or output can make sense) and performance (i.e., works so well that the modality adds no friction to the interaction).This chapter is a case study of the development of Google Search by Voice -a step toward our long-term vision of ubiquitous access. While the integration of speech input into Google search is a significant step toward more ubiquitous access, it has posed many problems in terms of the performance of core speech technologies and the design of effective user interfaces. Work is ongoing and no doubt the problems are far from solved. Nonetheless, we have at the minimum achieved a level of performance showing that usage of voice search is growing rapidly, and that many users do indeed become repeat users.

show abstract

Learning acoustic frame labeling for speech recognition with recurrent neural networks

Sak

et al. 2015

View full text Add to dashboard Cite

Filters for Efficient Composition of Weighted Finite-State Transducers

Allauzen

Riley

Schalkwyk

2011

View full text Add to dashboard Cite

Deploying GOOG-411: Early lessons in data, measurement, and testing

Bacchiani

Beaufays

Schalkwyk

et al. 2008

View full text Add to dashboard Cite

We describe our early experience building and optimizing GOOG-411, a fully automated, voice-enabled, business finder. We show how taking an iterative approach to system development allows us to optimize the various components of the system, thereby progressively improving user-facing metrics. We show the contributions of different data sources to recognition accuracy. For business listing language models, we see a nearly linear performance increase with the logarithm of the amount of training data. To date, we have improved our correct accept rate by 25% absolute, and increased our transfer rate by 35% absolute.

show abstract

Query language modeling for voice search

Chelba

Schalkwyk

Brants

et al. 2010

View full text Add to dashboard Cite

The paper presents an empirical exploration of google.com query stream language modeling. We describe the normalization of the typed query stream resulting in out-of-vocabulary (OoV) rates below 1% for a one million word vocabulary. We present a comprehensive set of experiments that guided the design decisions for a voice search service. In the process we re-discovered a less known interaction between Kneser-Ney smoothing and entropy pruning, and found empirical evidence that hints at non-stationarity of the query stream, as well as strong dependence on various English locales-USA, Britain and Australia.

show abstract

Long short term memory neural network for keyboard gesture decoding

Alsharif

Ouyang

Beaufays

et al. 2015

View full text Add to dashboard Cite

On lattice generation for large vocabulary speech recognition

Rybach

Riley

Schalkwyk

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Johan Schalkwyk

OpenFst: A General and Efficient Weighted Finite-State Transducer Library

“Your Word is my Command”: Google Search by Voice: A Case Study

Learning acoustic frame labeling for speech recognition with recurrent neural networks

Filters for Efficient Composition of Weighted Finite-State Transducers

Deploying GOOG-411: Early lessons in data, measurement, and testing

Query language modeling for voice search

Long short term memory neural network for keyboard gesture decoding

On lattice generation for large vocabulary speech recognition

Contact Info

Product

Resources

About