Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization

Sheikhzadeh, Hamid; Deng, Li

doi:10.1109/89.260337

Cited by 47 publications

(22 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The noise HMM generating the best score is selected and a fine scaling adjustment is carried out to adapt to the noise level using the Viterbi algorithm again. This procedure has been motivated by our earlier work [22] and is based on the assumption that noise training sequences with similar characteristics but varying levels result in AR-HMM's differing only in the AR gains (not in spectral shapes). In order to avoid confusing unvoiced speech (mainly fricatives) with nonspeech segments contaminated with noise, only segments more than 100 ms long are used for noise model updating.…”

Section: Noise Adaptation Algorithmmentioning

confidence: 99%

HMM-based strategies for enhancement of speech signals embedded in nonstationary noise

Sameti

Sheikhzadeh

Deng

et al. 1998

IEEE Trans. Speech Audio Process.

Self Cite

172

120

View full text Add to dashboard Cite

An improved hidden Markov model-based (HMMbased) speech enhancement system designed using the minimum mean square error principle is implemented and compared with a conventional spectral subtraction system. The improvements to the system are: 1) incorporation of mixture components in the HMM for noise in order to handle noise nonstationarity in a more flexible manner, 2) two efficient methods in the speech enhancement system design that make the system realtime implementable, and 3) an adaptation method to the noise type in order to accommodate a wide variety of noises expected under the enhancement system's operating environment. The results of the experiments designed to evaluate the performance of the HMM-based speech enhancement systems in comparison with spectral subtraction are reported. Three types of noise-white noise, simulated helicopter noise, and multitalker (cocktail party) noise-were used to corrupt the test speech signals. Both objective (global SNR) and subjective mean opinion score (MOS) evaluations demonstrate consistent superiority of the HMM-based enhancement systems that incorporate the innovations described in this paper over the conventional spectral subtraction method.

show abstract

Section: Noise Adaptation Algorithmmentioning

confidence: 99%

HMM-based strategies for enhancement of speech signals embedded in nonstationary noise

Sameti

Sheikhzadeh

Deng

et al. 1998

IEEE Trans. Speech Audio Process.

Self Cite

172

120

View full text Add to dashboard Cite

show abstract

“…In particular, the work of [9] uses a similar type of MCE algorithm for a global linear transformation on linear predictive coefficient-based (LPC-based) cepstral coefficients. This is a special case of the method we have presented in this paper in that our transformation is made 6 An earlier attempt to design a statistical speech recognizer using raw speech waveforms directly as the input features [26] encountered two main difficulties: i) prohibitively high computation burden for implementing a large system, and ii) less accurate modeling assumptions made in the statistical model (hidden filter model) characterizing the statistical properties of the speech waveform, in comparison with the models which characterize the statistical properties of the relatively slowly changing frame-based spectral features. dependent on each speech class and on each HMM state.…”

Section: Summary and Discussionmentioning

confidence: 98%

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

Chengalvarayan

Deng

1997

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

In the study reported in this paper, we investigate interactions of front-end feature extraction and back-end classification techniques in hidden Markov model-based (HMMbased) speech recognition. The proposed model focuses on dimensionality reduction of the mel-warped discrete fourier transform (DFT) feature space subject to maximal preservation of speech classification information, and aims at finding an optimal linear transformation on the mel-warped DFT according to the minimum classification error (MCE) criterion. This linear transformation, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error counts. A further generalization of the model allows integration of the discriminatively derived state-dependent transformation with the construction of dynamic feature parameters. Experimental results show that state-dependent transformation on mel-warped DFT features is superior in performance to the mel-frequency cepstral coefficients (MFCC's). An error rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during training.

show abstract

“…Since the attractor of an RSS captures all the relevant information about the underlying system, it is an efficient choice for signal analysis, processing and classifications. Sheikh Zadeh and Deng has proposed a work in time domain representation of speech signal using autoregressive modelling (Sheikhzadeh and Deng 1994). The RSS approach proposed here has the advantage of extracting both linear and non-linear aspects of the entire system.…”

Section: Reconstructed State Space For Speech Recognitionmentioning

confidence: 98%

Time–domain non-linear feature parameter for consonant classification

Thasleema

Prajith

Narayanan

2012

Int J Speech Technol

View full text Add to dashboard Cite

This paper introduces an accurate time-domain approach to model and classify the Malayalam consonantVowel (CV) speech unit waveforms. The technique is based on statistical models of Reconstructed State Space (RSS). A feature extraction method using RSS based State Space Point Distribution (SSPD) parameters are studied. The results of the simulation experiment performed on the Malayalam CV speech databases using Artificial Neural Network (ANN) and k-Nearest Neighborhood (k-NN) classifiers are also presented. The results indicate that the efficiency of the RSS approach is capable of increasing speaker independent consonant speech recognition accuracy.

show abstract

Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization

Cited by 47 publications

References 14 publications

HMM-based strategies for enhancement of speech signals embedded in nonstationary noise

HMM-based strategies for enhancement of speech signals embedded in nonstationary noise

HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features

Time–domain non-linear feature parameter for consonant classification

Contact Info

Product

Resources

About