Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-234
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic Modeling for Google Home

Abstract: This paper describes the technical and system building advances made to the Google Home multichannel speech recognition system, which was launched in November 2016. Technical advances include an adaptive dereverberation frontend, the use of neural network models that do multichannel processing jointly with acoustic modeling, and Grid-LSTMs to model frequency variations. On the system level, improvements include adapting the model using Google Home specific data. We present results on a variety of multichannel … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
112
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 151 publications
(113 citation statements)
references
References 11 publications
0
112
0
1
Order By: Relevance
“…The usefulness of these techniques, particularly for ASR, has been extensively studied, e.g., at the REVERB challenge [19] and the CHiME-3/4/5 challenges [20]- [22]. Moreover, advances in these techniques have led to recent progress on commercial devices, such as smart speakers [23]- [25].…”
Section: Introductionmentioning
confidence: 99%
“…The usefulness of these techniques, particularly for ASR, has been extensively studied, e.g., at the REVERB challenge [19] and the CHiME-3/4/5 challenges [20]- [22]. Moreover, advances in these techniques have led to recent progress on commercial devices, such as smart speakers [23]- [25].…”
Section: Introductionmentioning
confidence: 99%
“…However it is still unclear how to design a suitable neural network architecture to exploit temporal and frequency information in deriving an effective speech emotional representation. In [14,15], 2D Time-Frequency (TF) LSTM and Grid-LSTM were proposed to model the variation over time and frequency for large scale automatic speech recognition (ASR). However, complex model architectures are prone to overfitting on a small scale dataset such as IEMOCAP [16].…”
Section: Introductionmentioning
confidence: 99%
“…This improvement has come about from the shift from Gaussian Mixture Model (GMM) to the Feed-Forward Deep Neural Networks (FF-DNNs), FF-DNNs to Recurrent Neural Network (RNN) and in particular the Long Short-Term Memory (LSTM) networks [9]. Thanks to these advances, voice assistant devices such as Google Home [2,10] , Amazon Alexa or Samsung Bixby [11] are being used at many homes and on personal devices.…”
Section: Introductionmentioning
confidence: 99%