2006
DOI: 10.1109/tsa.2005.860349
|View full text |Cite
|
Sign up to set email alerts
|

Automatic speech recognition with an adaptation model motivated by auditory processing

Abstract: The Mel-Frequency Cepstral Coefficient (MFCC) or Perceptual Linear Prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel-or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We therefore incorporated a simplifie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(30 citation statements)
references
References 27 publications
0
30
0
Order By: Relevance
“…But Eqs. (11), which is designed to imitate synaptic adaptation, is demonstrated to work better than RASTA [22]. In addition, both synaptic adaptation and temporal integration achieve good performance in [24], and combining both of them achieves further performance gain.…”
Section: Recognition Results On An Isolated-word Taskmentioning
confidence: 99%
See 1 more Smart Citation
“…But Eqs. (11), which is designed to imitate synaptic adaptation, is demonstrated to work better than RASTA [22]. In addition, both synaptic adaptation and temporal integration achieve good performance in [24], and combining both of them achieves further performance gain.…”
Section: Recognition Results On An Isolated-word Taskmentioning
confidence: 99%
“…This mechanism can enhance sound detection and improve the immunity to stationary noise. In [22], synaptic adaptation is implemented using a first-order IIR filter and it has been shown outperforming the RASTA processing. The transfer function of this filter is…”
Section: Synaptic Adaptationmentioning
confidence: 99%
“…Several noise compensation methods such as ETSI AFE and VTS followed by feature standardization using CMVN, are also compared in this study. Furthermore, the classification performance of the hybrid features is also compared with a multi-condition/multi-style classifier [9,10] trained with standard cepstral features. It will be shown that the multi-style training with cepstral features significantly improves the robustness however it is highly sensitive to the mismatch between the noise type contaminating the training and test data.…”
Section: Hybrid Featuresmentioning
confidence: 99%
“…Large amount of training data is required to retrain the system to a new environment. To make the cepstral representations of speech less sensitive to noise, several techniques such as cepstral mean and variance normalization (CMVN) [8] and multi-condition/multi-style training [9,10] have been proposed to reduce explicitly the effects of noise on spectral representations with the aim of approaching the optimal performance which is achieved when training and testing conditions are matched [11]. State-of-the-art feature compensation methods for the cepstral representation of speech include the ETSI advanced front end (AFE) [12] and vector Taylor series (VTS) [13,14].…”
Section: Introductionmentioning
confidence: 99%
“…[4,5]), a number of feature extraction methods have been proposed in recent years that exploit temporal information. These systems typically provide a recognition accuracy that exceeds that obtained using MFCC or PLP features in the presence of noise and other adverse conditions [6,7,8], especially if they are combined with a traditional recognition system in some fashion.…”
Section: Introductionmentioning
confidence: 99%