1994
DOI: 10.1109/89.260337
|View full text |Cite
|
Sign up to set email alerts
|

Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization

Abstract: where he is currently an Associate Professor. From 1992 to 1993, he conducted sabbatical research at the Laboratorv for Computer Science, Massachusetts Institute of Technology, Cambridge. His research interests include acoustic-phonetic modeling of speech, automatic speech recognition, statistical methods for signal analysis, computational phonology, auditory signal processing, and auditory neuroscience. In these areas, he has written more than 50 published papers.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0
1

Year Published

1997
1997
2014
2014

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 47 publications
(22 citation statements)
references
References 14 publications
0
21
0
1
Order By: Relevance
“…The noise HMM generating the best score is selected and a fine scaling adjustment is carried out to adapt to the noise level using the Viterbi algorithm again. This procedure has been motivated by our earlier work [22] and is based on the assumption that noise training sequences with similar characteristics but varying levels result in AR-HMM's differing only in the AR gains (not in spectral shapes). In order to avoid confusing unvoiced speech (mainly fricatives) with nonspeech segments contaminated with noise, only segments more than 100 ms long are used for noise model updating.…”
Section: Noise Adaptation Algorithmmentioning
confidence: 99%
“…The noise HMM generating the best score is selected and a fine scaling adjustment is carried out to adapt to the noise level using the Viterbi algorithm again. This procedure has been motivated by our earlier work [22] and is based on the assumption that noise training sequences with similar characteristics but varying levels result in AR-HMM's differing only in the AR gains (not in spectral shapes). In order to avoid confusing unvoiced speech (mainly fricatives) with nonspeech segments contaminated with noise, only segments more than 100 ms long are used for noise model updating.…”
Section: Noise Adaptation Algorithmmentioning
confidence: 99%
“…In particular, the work of [9] uses a similar type of MCE algorithm for a global linear transformation on linear predictive coefficient-based (LPC-based) cepstral coefficients. This is a special case of the method we have presented in this paper in that our transformation is made 6 An earlier attempt to design a statistical speech recognizer using raw speech waveforms directly as the input features [26] encountered two main difficulties: i) prohibitively high computation burden for implementing a large system, and ii) less accurate modeling assumptions made in the statistical model (hidden filter model) characterizing the statistical properties of the speech waveform, in comparison with the models which characterize the statistical properties of the relatively slowly changing frame-based spectral features. dependent on each speech class and on each HMM state.…”
Section: Summary and Discussionmentioning
confidence: 98%
“…Since the attractor of an RSS captures all the relevant information about the underlying system, it is an efficient choice for signal analysis, processing and classifications. Sheikh Zadeh and Deng has proposed a work in time domain representation of speech signal using autoregressive modelling (Sheikhzadeh and Deng 1994). The RSS approach proposed here has the advantage of extracting both linear and non-linear aspects of the entire system.…”
Section: Reconstructed State Space For Speech Recognitionmentioning
confidence: 98%