2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462227
|View full text |Cite
|
Sign up to set email alerts
|

Monophone-Based Background Modeling for Two-Stage On-Device Wake Word Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
41
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 70 publications
(41 citation statements)
references
References 21 publications
0
41
0
Order By: Relevance
“…Keyword spotting is the task of detecting a specific word in continuous speech signal. In recent years, keyword spotting gains substantial performance improvements with deep learning algorithms [1,2,3,4,5]. More recently, end-toend trained models have been successfully applied in automatic speech recognition (ASR) [6,7,8,9,10] and KWS [11,12,13,14].…”
Section: Introductionmentioning
confidence: 99%
“…Keyword spotting is the task of detecting a specific word in continuous speech signal. In recent years, keyword spotting gains substantial performance improvements with deep learning algorithms [1,2,3,4,5]. More recently, end-toend trained models have been successfully applied in automatic speech recognition (ASR) [6,7,8,9,10] and KWS [11,12,13,14].…”
Section: Introductionmentioning
confidence: 99%
“…Traditional approaches of KWS are based on the keyword/filler Hidden Markov Model (HMM) [4,5], which are trained for both keyword and non-keyword audio segments. Techniques such as attention-based model [6] and time delay neural network [7] have also been explored for better performance.…”
Section: Introductionmentioning
confidence: 99%
“…They range from traditional approaches that make use of a Hidden Markov Model (HMM) to characterize acoustic features from a DNN into both "keyword" and "background" (i.e. nonkeyword speech and noise) classes [1,2,3,4,5]. Simpler derivatives of that approach perform a temporal integration computation that verifies the outputs of the acoustic model are high in the right sequence for the target keyword in order to produce a single detection likelihood score [6,7,8,9,10].…”
Section: Introductionmentioning
confidence: 99%