2014
DOI: 10.1007/s10772-014-9246-4
|View full text |Cite
|
Sign up to set email alerts
|

Selection and enhancement of Gabor filters for automatic speech recognition

Abstract: Abstract. Motivated by neurophysiological studies, the use of Gabor filters as acoustic feature extractors for speech recognition purposes has received increasing attention in the new millenium. As the optimal parametrization of these filters is not obvious, many researchers employ different feature selection methods to find the best filter set. In this study, however, we argue that these kinds of feature selection methods cannot fulfill this task, as we demonstrate this with results obtained from experiments.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…A deep belief networkdeep neural network (DBN-DNN) with four hidden layers having ten frames of input temporal context and a sigmoid nonlinearity is discriminatively trained using the training data and a tri-gram language model is used in the ASR decoding. We compare the ASR performance of the proposed modulation filtering approach with traditional mel filter bank energy (MFB) features, power normalized filter bank energy (PFB) features (Kim and Stern, 2012), advanced ETSI front-end (ETS) (ETSI, 2002), RASTA features (RAS) (Hermansky and Morgan, 1994), LDA based features (Van Vuuren and Hermansky, 1997), spectro-temporal Gabor filters with filter selection based features (GAB) (Kovacs et al, 2015), MHEC features (MHE) (Sadjadi and Hansen, 2015), and auditory spectrogram features (ASp) (Chi et al, 2005). The results for the proposed data-driven modulation filtering obtained from MFB and ASp are also shown here.…”
Section: Experiments a Speech Recognition Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…A deep belief networkdeep neural network (DBN-DNN) with four hidden layers having ten frames of input temporal context and a sigmoid nonlinearity is discriminatively trained using the training data and a tri-gram language model is used in the ASR decoding. We compare the ASR performance of the proposed modulation filtering approach with traditional mel filter bank energy (MFB) features, power normalized filter bank energy (PFB) features (Kim and Stern, 2012), advanced ETSI front-end (ETS) (ETSI, 2002), RASTA features (RAS) (Hermansky and Morgan, 1994), LDA based features (Van Vuuren and Hermansky, 1997), spectro-temporal Gabor filters with filter selection based features (GAB) (Kovacs et al, 2015), MHEC features (MHE) (Sadjadi and Hansen, 2015), and auditory spectrogram features (ASp) (Chi et al, 2005). The results for the proposed data-driven modulation filtering obtained from MFB and ASp are also shown here.…”
Section: Experiments a Speech Recognition Systemmentioning
confidence: 99%
“…For learning the temporal modulation filters in a data-driven manner, the linear discriminant analysis (LDA) has been explored (Van Vuuren and Hermansky, 1997;Hung and Lee, 2006). A data-driven approach for parameter selection of Gabor filter set has been recently studied (Kovacs et al, 2015;Schadler et al, 2012). A recent approach to separable spectrotemporal Gabor filter bank features shows that spectral and temporal processing can be performed independently (Schadler and Kollmeier, 2013).…”
mentioning
confidence: 99%
“…1, this is represented by the dark grey areas. The feature extraction layer then takes these patches, and extracts 9 features from each of them (meaning that this layer consists of altogether 54 linear neurons -6 times 9), and sends the output to the hidden layer (which in our experiments had 4000 neurons [12], using the sigmoid activation function). From this point on, the system behaves just like any other conventional neural net; the hidden layer processes the information and passes it on the the output layer (which in our experiments had 39 neurons, corresponding to the 39 phone classes) that provides the output.…”
Section: Joint Optimization Of Neural Net Classifiers and Spectro-temmentioning
confidence: 99%
“…Traditionally, these two steps (feature extraction and recognition) are performed separately. In earlier papers [11,12], we showed that the spectro-temporal feature extraction step and the ANNbased recognition step could be combined, and the parameters needed for the two phases could be trained together. Our solution was based on the observation that the spectro-temporal filters can be treated as special types of neurons, and so the standard backpropagation training algorithm of ANNs can be extended to the feature extraction step as well.…”
Section: Introductionmentioning
confidence: 99%
“…In our earlier publications on the topic of joint optimisation of spectro-temporal features and acoustic models, the idea of incorporating delta (∆) and acceleration (∆∆) coefficients was raised multiple times [125,126]. This idea was based on the results of earlier experiments where the addition of ∆ and ∆∆ coefficients to spectro-temporal features increased the accuracy of the recognition process [123].…”
Section: Delta and Acceleration Coefficientsmentioning
confidence: 99%