An Application of Recurrent Neural Networks to Discriminative Keyword Spotting

Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen

doi:10.1007/978-3-540-74695-9_23

Cited by 196 publications

(157 citation statements)

References 13 publications

Supporting

Mentioning

150

Contrasting

Unclassified

Order By: Relevance

“…This performed better than traditional RNN as the network can learn from experience given appropriate input weight matrix to classify and predict time series in a long time variance sequence. It has outperformed HMM and RNN as a sequence learning method in some applications, such as in unsegmented cursive hand writing [34] and speech applications [35]. It architecture is made up of RNN + LSTM blocks which augment the network by remembering arbitrary value in a long period of time.…”

Section: Neural Network and Some Extensionsmentioning

confidence: 99%

On the Problem of Features Variability in Sequence Learning Problems

Yahaya¹

2015

IJFCC

View full text Add to dashboard Cite

Abstract-Sequential learning problems such as speech, cursive handwriting, time series forecasting and protein sequence prediction. Both Speech and cursive handwriting recognition are challenging problems to Pattern recognition systems, in particular speech signal. Some peculiar characteristics of these types of problems are that, the signal or pattern evolves with time, modeling a long time dependencies in this pattern is a major challenge. Hidden Markov models (HMM) have been applied for these types of problems. Due to some obvious shortcomings of HMM, neural networks were also explored and applied as well as their hybrids. The problem of feature variability in sequence learning is still a challenging problem. In this paper, we analyzed the problem, present some methods in feature variance suppression in character recognition, and review some research efforts in modification of neural networks and applications. We proposed a structure for a state-based neural network.Index Terms-Sequence learning, feature variability, neural network.

show abstract

Section: Neural Network and Some Extensionsmentioning

confidence: 99%

On the Problem of Features Variability in Sequence Learning Problems

Yahaya¹

2015

IJFCC

View full text Add to dashboard Cite

show abstract

“…Combining bidirectional networks with LSTM gives Bidirectional LSTM (BLSTM), which has demonstrated excellent performance in phoneme recognition [5] and keyword spotting [6].…”

Section: Bidirectional Lstmmentioning

confidence: 99%

“…In this work we build in context information by including the outputs of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural network [4,5] in the feature functions. Similar neural network architectures have been successfully applied to speech or emotion recognition related tasks [6,5,7], where they exploit contextual information whenever speech production or perception is influenced by emotion, strong accents, or background noise. In contrast to [6], our keyword spotting approach uses BLSTM for phoneme discrimination and not for the recognition of whole keywords.…”

Section: Introductionmentioning

confidence: 99%

“…Similar neural network architectures have been successfully applied to speech or emotion recognition related tasks [6,5,7], where they exploit contextual information whenever speech production or perception is influenced by emotion, strong accents, or background noise. In contrast to [6], our keyword spotting approach uses BLSTM for phoneme discrimination and not for the recognition of whole keywords. As well as reducing the complexity of the network, the use of phonemes makes it applicable to any keyword spotting task.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

Wöllmer

Eyben

Keshet

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure that non-linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artificial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has been proven to outperform HMM-based techniques.

show abstract

“…Hybrid or Tandem architectures that combine discriminatively trained neural networks with Gaussian mixture modeling are widely used for speech recognition [5,6]. However, BLSTM is a relatively new architecture that has so far been applied to keyword spotting in only three works: in [7] and [8] the framewise phoneme predictions of BLSTM (without CTC) were shown to enhance keyword spotting performance of discriminative and generative models, respectively; and in [9] a keyword spotter using only BLSTM-CTC was introduced. The disadvantage of the latter method is that it has a separate output unit for each keyword, which requires excessive amounts of training data for large vocabularies, and also means the network must be retrained when new keywords are added.…”

Section: Introductionmentioning

confidence: 99%

Spoken term detection with Connectionist Temporal Classification: A novel hybrid CTC-DBN decoder

Wöllmer

Eyben

Schuller

et al. 2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

This paper proposes a novel system for robust keyword detection in continuous speech. Our decoder is composed of a bidirectional Long Short-Term Memory recurrent neural network using a Connectionist Temporal Classification (CTC) output layer, and a Dynamic Bayesian Network (DBN). The CTC network exploits bidirectional context information to reliably identify phonemes, whereas the DBN is able to discriminate between keywords and arbitrary speech while explicitly modeling substitutions, deletions, and insertions in the CTC phoneme output string. Our technique is vocabulary independent and does not require an explicit garbage model. Experiments show that our system architecture prevails over a standard Hidden Markov Model approach.

show abstract

An Application of Recurrent Neural Networks to Discriminative Keyword Spotting

Cited by 196 publications

References 13 publications

On the Problem of Features Variability in Sequence Learning Problems

On the Problem of Features Variability in Sequence Learning Problems

Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

Spoken term detection with Connectionist Temporal Classification: A novel hybrid CTC-DBN decoder

Contact Info

Product

Resources

About