Online word-spotting in continuous speech with recurrent neural networks

Baljekar, Pallavi; Lehman, Jill Fain; Singh, Rita

doi:10.1109/slt.2014.7078631

Cited by 29 publications

(18 citation statements)

References 9 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…What we saw is that our results in terms of keyword search quality fall in between those reported for Cantonese when GMMs are used in the acoustic model and are slightly worse when deep neural networks are used (MTWV 0.335 and 0.441, resp.). As for the real-time factor our results outperform those reported in [14], which may be attributed to a relatively small number of Gaussians we use per senone.…”

Section: Resultscontrasting

confidence: 44%

“…An average MTWV reported for these languages ranges from 0.22 for Zulu to 0.67 for Haitian Creole. In [14] the use of recurrent neural networks for example-based word spotting in real time for English is described. Compared to more widespread textbased systems, this approach makes use of spoken examples of a keyword to build up a word-based model and then do the search within speech data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

Smirnov¹,

Ignatov²,

Gusev³

et al. 2016

Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

The paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech recognition. Key algorithms and system settings are described, including the pronunciation variation algorithm, and the experimental results on the real-life telecom data are provided. The description of system architecture and the user interface is provided. The system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and algorithms developed by Speech Drive LLC. The effective combination of baseline statistic methods, real-world training data, and the intensive use of linguistic knowledge led to a quality result applicable to industrial use.

show abstract

Section: Resultscontrasting

confidence: 44%

Section: Introductionmentioning

confidence: 99%

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

Smirnov¹,

Ignatov²,

Gusev³

et al. 2016

Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

show abstract

“…They have demonstrated efficiency in terms of inference speed and computational cost but fail at capturing large patterns with reasonably small models. Recent works have suggested RNN based keyword spotting using LSTM cells that can leverage longer temporal context using gating mechanism and internal states [7,8,9]. However, because RNNs may suffer from state saturation when facing continuous input streams [10], their internal state needs to be periodically reset.…”

Section: Introductionmentioning

confidence: 99%

Efficient Keyword Spotting Using Dilated Convolutions and Gating

Coucke

Chlieh

Gisselbrecht

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset -"Hey Snips" utterances recorded by over 2.2K different speakershas been made publicly available to establish an open reference for wake-word detection.

show abstract

“…Alternative segment-level loss functions include different statistics of frame-level keyword posteriors within a keyword segment, e.g., the geometric mean etc. There have been literatures on training LSTMs using Connectionist Temporal Classification (CTC) [14,15,16,23] for keyword spotting tasks as well. In addition, architectures that combine LSTMs and CNNs have been applied to different tasks [24,25].…”

Section: Max-poolingmentioning

confidence: 99%

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Sun

Raju

Tucker

et al. 2016

2016 IEEE Spoken Language Technology Workshop (SLT)

103

View full text Add to dashboard Cite

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for smallfootprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a crossentropy pre-trained network shows the best performance, which yields 67.6% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

show abstract

Online word-spotting in continuous speech with recurrent neural networks

Cited by 29 publications

References 9 publications

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

Efficient Keyword Spotting Using Dilated Convolutions and Gating

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

Contact Info

Product

Resources

About