Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-583
|View full text |Cite
|
Sign up to set email alerts
|

Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting

Abstract: It has been shown in [1,2] that improved performance can be achieved by formulating the keyword spotting as a non-uniform error automatic speech recognition problem. In this work, we discriminatively train a deep bidirectional long short-term memory (BLSTM) -hidden Markov model (HMM) based acoustic model with non-uniform boosted minimum classification error (BMCE) criterion which imposes more significant error cost on the keywords than those on the non-keywords. By introducing the BLSTM, the context informatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 28 publications
(29 reference statements)
0
2
0
Order By: Relevance
“…Spoken term detection has been a hot topic in the past few years due to its many applications and has received a great deal of interest from many outstanding companies and research institutes, such as IBM [14,[32][33][34][35][36], BBN [37][38][39], SRI & OGI [40][41][42], BUT [17,43,44], Microsoft [45], QUT [46,47], JHU [16,[48][49][50], Fraunhofer IAIS/NTNU/TUD [15], NTU [31,51], IDIAP [52] and Google [21], among others. Within an STD system, the ASR subsystem uses mostly word-based speech recognition [24,41,[53][54][55][56][57][58][59] due to its better performance in comparison with subword-based approaches.…”
Section: Spoken Term Detectionmentioning
confidence: 99%
“…Spoken term detection has been a hot topic in the past few years due to its many applications and has received a great deal of interest from many outstanding companies and research institutes, such as IBM [14,[32][33][34][35][36], BBN [37][38][39], SRI & OGI [40][41][42], BUT [17,43,44], Microsoft [45], QUT [46,47], JHU [16,[48][49][50], Fraunhofer IAIS/NTNU/TUD [15], NTU [31,51], IDIAP [52] and Google [21], among others. Within an STD system, the ASR subsystem uses mostly word-based speech recognition [24,41,[53][54][55][56][57][58][59] due to its better performance in comparison with subword-based approaches.…”
Section: Spoken Term Detectionmentioning
confidence: 99%
“…For the ASR stage, word-based speech recognition has been widely used [35,[48][49][50][51][52][53][54], since this typically yields better performance than subword-based ASR [55][56][57][58][59][60][61][62] due to the lexical and language model (LM) information employed by the word-based ASR. However, one of the main drawbacks of word-based ASR is that it can only detect in-vocabulary (INV) terms.…”
Section: Spoken Term Detection Overviewmentioning
confidence: 99%