Automatic speech recognition based on cepstral coefficients and a mel-based discrete energy operator

Tolba, Hesham; O’Shaughnessy, D.

doi:10.1109/icassp.1998.675429

Cited by 5 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The energy operator has been applied successfully to demodulation and has many attractive features such as simplicity, efficiency, and adaptability to instantaneous signal variations [3]. The attractive physical interpretation of the energy operator has led to its use as an ASR feature extractor in various forms, see for example [12], [13].…”

Section: Quadratic Operators and Energy Spectrummentioning

confidence: 99%

“…In general, using (5) the sum of any quadratic operator output (e.g., see [4], [1]) can be expressed as (12) where are arbitrary constants. For narrowband signals , can be assumed constant around and the short-time average of can be expressed as (13) i.e., the difference between the log of any time-frequency distribution produced by the generalized ASR front-end in Fig.…”

Section: Quadratic Operators and Energy Spectrummentioning

confidence: 99%

“…Given the similarity between the time-frequency distributions of quadratic operators it is expected that ASR performance will also be similar for various front-ends that use short-time averages of quadratic operators as features. However, as the size of the short-time window decreases and/or the bandwidth of the filter increases the differences among are no longer time-invariant, i.e., and significant ASR performance differences may arise between various front-ends (see for example [12] where the energy operator is applied to the unfiltered signal). The equivalence between , and as features (in the cepstrum domain) for ASR is experimentally shown in Section IV.…”

Section: Quadratic Operators and Energy Spectrummentioning

confidence: 99%

See 2 more Smart Citations

Time-frequency distributions for automatic speech recognition

Potamianos

Maragos

2001

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

Abstract-The use of general time-frequency distributions as features for automatic speech recognition (ASR) is discussed in the context of hidden Markov classifiers. Short-time averages of quadratic operators, e.g., energy spectrum, generalized first spectral moments, and short-time averages of the instantaneous frequency, are compared to the standard front end features, and applied to ASR. Theoretical and experimental results indicate a close relationship among these feature sets.

show abstract

Section: Quadratic Operators and Energy Spectrummentioning

confidence: 99%

Section: Quadratic Operators and Energy Spectrummentioning

confidence: 99%

Section: Quadratic Operators and Energy Spectrummentioning

confidence: 99%

See 1 more Smart Citation

Time-frequency distributions for automatic speech recognition

Potamianos

Maragos

2001

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

show abstract

“…For resonance signals, the Teager-Kaiser Energy and the nonlinear energy operator Ψ provide a good estimation of the "real" source energy. Recently, Teager energy has been used for speech recognition in [10,12]. In this paper, we extend this work and design a front-end that combines an auditorymotivated filterbank with the Teager energy estimation method.…”

Section: Introductionmentioning

confidence: 94%

Auditory Teager energy cepstrum coefficients for robust speech recognition

2005

View full text Add to dashboard Cite

In this paper, a feature extraction algorithm for robust speech recognition is introduced. The feature extraction algorithm is motivated by the human auditory processing and the nonlinear Teager-Kaiser energy operator that estimates the true energy of the source of a resonance. The proposed features are labeled as Teager Energy Cepstrum Coefficients (TECCs). TECCs are computed by first filtering the speech signal through a dense non constant-Q Gammatone filterbank and then by estimating the "true" energy of the signal's source, i.e., the short-time average of the output of the Teager-Kaiser energy operator. Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the TECCs perform significantly better than the MFCCs for noisy recognition tasks. Specifically, relative word error rate improvement of 60% over the MFCC baseline is shown for the Aurora-3 database for the high-mismatch condition. Absolute error rate improvement ranging from 5% to 20% is shown for a phone recognition task in (various types of additive) noise.

show abstract

“…The Teager energy is a noise robust parameter for speech recognition because the effect of additive noise is attenuated: good results are obtained in presence of car engine noise [20]. The Instantaneous energy reflects only the amplitude of the signal whereas the Teager energy operator reflects the variations in both amplitude and frequency of the signal [45]. Figure 4 is an example of two spectrograms: one based on wavelet coefficients (Coiflet, 5 bands, Teager energy) and the other based on STFT coefficients for the same signal.…”

Section: Energy-based Parametersmentioning

confidence: 99%

A wavelet-based parameterization for speech/music discrimination

Didiot

Illina

Fohr

et al. 2010

Computer Speech & Language

View full text Add to dashboard Cite

RésuméThis paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.

show abstract

Automatic speech recognition based on cepstral coefficients and a mel-based discrete energy operator

Cited by 5 publications

References 9 publications

Time-frequency distributions for automatic speech recognition

Time-frequency distributions for automatic speech recognition

Auditory Teager energy cepstrum coefficients for robust speech recognition

A wavelet-based parameterization for speech/music discrimination

Contact Info

Product

Resources

About