Selection and enhancement of Gabor filters for automatic speech recognition

Kovács, George L.; Tóth, László; Compernolle, Dirk Van

doi:10.1007/s10772-014-9246-4

Cited by 11 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A deep belief networkdeep neural network (DBN-DNN) with four hidden layers having ten frames of input temporal context and a sigmoid nonlinearity is discriminatively trained using the training data and a tri-gram language model is used in the ASR decoding. We compare the ASR performance of the proposed modulation filtering approach with traditional mel filter bank energy (MFB) features, power normalized filter bank energy (PFB) features (Kim and Stern, 2012), advanced ETSI front-end (ETS) (ETSI, 2002), RASTA features (RAS) (Hermansky and Morgan, 1994), LDA based features (Van Vuuren and Hermansky, 1997), spectro-temporal Gabor filters with filter selection based features (GAB) (Kovacs et al, 2015), MHEC features (MHE) (Sadjadi and Hansen, 2015), and auditory spectrogram features (ASp) (Chi et al, 2005). The results for the proposed data-driven modulation filtering obtained from MFB and ASp are also shown here.…”

Section: Experiments a Speech Recognition Systemmentioning

confidence: 99%

“…For learning the temporal modulation filters in a data-driven manner, the linear discriminant analysis (LDA) has been explored (Van Vuuren and Hermansky, 1997;Hung and Lee, 2006). A data-driven approach for parameter selection of Gabor filter set has been recently studied (Kovacs et al, 2015;Schadler et al, 2012). A recent approach to separable spectrotemporal Gabor filter bank features shows that spectral and temporal processing can be performed independently (Schadler and Kollmeier, 2013).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised modulation filter learning for noise-robust speech recognition

Agrawal

Ganapathy

2017

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

The modulation filtering approach to robust automatic speech recognition (ASR) is based on enhancing perceptually relevant regions of the modulation spectrum while suppressing the regions susceptible to noise. In this paper, a data-driven unsupervised modulation filter learning scheme is proposed using convolutional restricted Boltzmann machine. The initial filter is learned using the speech spectrogram while subsequent filters are learned using residual spectrograms. The modulation filtered spectrograms are used for ASR experiments on noisy and reverberant speech where these features provide significant improvements over other robust features. Furthermore, the application of the proposed method for semi-supervised learning is investigated.

show abstract

Section: Experiments a Speech Recognition Systemmentioning

confidence: 99%

mentioning

confidence: 99%

Unsupervised modulation filter learning for noise-robust speech recognition

Agrawal

Ganapathy

2017

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

show abstract

“…1, this is represented by the dark grey areas. The feature extraction layer then takes these patches, and extracts 9 features from each of them (meaning that this layer consists of altogether 54 linear neurons -6 times 9), and sends the output to the hidden layer (which in our experiments had 4000 neurons [12], using the sigmoid activation function). From this point on, the system behaves just like any other conventional neural net; the hidden layer processes the information and passes it on the the output layer (which in our experiments had 39 neurons, corresponding to the 39 phone classes) that provides the output.…”

Section: Joint Optimization Of Neural Net Classifiers and Spectro-temmentioning

confidence: 99%

“…Traditionally, these two steps (feature extraction and recognition) are performed separately. In earlier papers [11,12], we showed that the spectro-temporal feature extraction step and the ANNbased recognition step could be combined, and the parameters needed for the two phases could be trained together. Our solution was based on the observation that the spectro-temporal filters can be treated as special types of neurons, and so the standard backpropagation training algorithm of ANNs can be extended to the feature extraction step as well.…”

Section: Introductionmentioning

confidence: 99%

Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition

Kovács¹,

Tóth²

2015

Acta Cybern

Self Cite

View full text Add to dashboard Cite

In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier that this approach can be successfully applied for the recognition of speech. In this paper, we propose two further improvements to our method based on recent advances in neural net technology and extend our evaluation to speech conatminated with new types of noise. By repeating our experiments on TIMIT phone recognition tasks using clean and noise contaminated speech, we can compare the recognition performance of the original framework with our new, modified framework. The results indicate that both these modifications significantly improve the recognition performance of our framework. Moreover, we will show that these modifications allow us to achieve a substantially better performance than what we got earlier.

show abstract

“…In our earlier publications on the topic of joint optimisation of spectro-temporal features and acoustic models, the idea of incorporating delta (∆) and acceleration (∆∆) coefficients was raised multiple times [125,126]. This idea was based on the results of earlier experiments where the addition of ∆ and ∆∆ coefficients to spectro-temporal features increased the accuracy of the recognition process [123].…”

Section: Delta and Acceleration Coefficientsmentioning

confidence: 99%

Noise Robust Automatic Speech Recognition Based on Spectro-Temporal Techniques

Kovács

Self Cite

View full text Add to dashboard Cite

Selection and enhancement of Gabor filters for automatic speech recognition

Cited by 11 publications

References 27 publications

Unsupervised modulation filter learning for noise-robust speech recognition

Unsupervised modulation filter learning for noise-robust speech recognition

Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition

Noise Robust Automatic Speech Recognition Based on Spectro-Temporal Techniques

Contact Info

Product

Resources

About