Advances in Speech Recognition 2010
DOI: 10.1007/978-1-4419-5951-5_4
|View full text |Cite
|
Sign up to set email alerts
|

“Your Word is my Command”: Google Search by Voice: A Case Study

Abstract: An important goal at Google is to make spoken access ubiquitously available. Achieving ubiquity requires two things: availability (i.e., built into every possible interaction where speech input or output can make sense) and performance (i.e., works so well that the modality adds no friction to the interaction).This chapter is a case study of the development of Google Search by Voice -a step toward our long-term vision of ubiquitous access. While the integration of speech input into Google search is a significa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
134
0
3

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 204 publications
(138 citation statements)
references
References 9 publications
1
134
0
3
Order By: Relevance
“…An empirical study of ASGD for sequence training was conducted for a small and a large Voice Search task [21]. The small size of the Icelandic dataset allows for a thorough evaluation of the different issues and hyper-parameters while the large scale English dataset demonstrates the scalability of the algorithm.…”
Section: Methodsmentioning
confidence: 99%
“…An empirical study of ASGD for sequence training was conducted for a small and a large Voice Search task [21]. The small size of the Icelandic dataset allows for a thorough evaluation of the different issues and hyper-parameters while the large scale English dataset demonstrates the scalability of the algorithm.…”
Section: Methodsmentioning
confidence: 99%
“…The goal of this paper is to look at the complexity/accuracy tradeoff of such a system. To explore this tradeoff, we trained a collection of systems of various complexities on two datasets: US English Voice Search [5] and voice typing, and Iberian Portuguese Voice Search, by varying the width of the hidden layers of the DNN acoustic model. The rest of the system was kept fixed: the frontend consists of 40 log-filterbank energies computed every 10 ms, stacked 20 frames in the past and 5 frames in the future to limit latency.…”
Section: Introductionmentioning
confidence: 99%
“…As a baseline, we have used a standard speech recognition front end computing perceptual linear predictive (PLP) features [8]. Thirteen cepstral coefficients are computed every 10 ms over 25 ms windows, and the energy coefficient C0 is discarded, which decreases the sensitivity to changes in volume.…”
Section: Acoustic Front-endmentioning
confidence: 99%