Speech recognition using advanced HMM2 features

Weber, Katrin; Bengio, Samy; Bourlard, Hervé

doi:10.1109/asru.2001.1034590

Cited by 9 publications

(8 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This gives us a two-level HMM: a HMM where each state corresponds to a word, and where the output function is a HMM where each state corresponds to a letter. This relates to two other approaches that we are aware of (Fine et al, 1998) and (Weber et al, 2001).…”

Section: Perplexity Evaluationmentioning

confidence: 79%

Combining distributional and morphological information for part of speech induction

Clark

2003

Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - EACL '03

109

129

View full text Add to dashboard Cite

show abstract

Section: Perplexity Evaluationmentioning

confidence: 79%

Combining distributional and morphological information for part of speech induction

Clark

2003

Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - EACL '03

109

129

View full text Add to dashboard Cite

show abstract

“…Previously, promising results were obtained with both variants of HMM2. In [6], we reported word error rates (WER) of 14.0% (on the clean Numbers95 database, [1]) for variant (a). As described above, here the secondary HMM acted as likelihood estimator.…”

Section: Building From Previous Resultsmentioning

confidence: 99%

“…1, a secondary feature vector as used for the HMM2 system is thus composed of an FF2 coefficient (c s ), its first and second order derivatives (d s and a s ) and a further coefficient reflecting the frequency position of that vector (f s ). Supplementing the 3-dimensional secondary feature vector by such a 'frequency index' has shown significant benefits for speech recognition performance, allowing a better modeling of formant positions (the reader is referred to [6] for more details on the frequency index, its motivations, realization and performance improvements).…”

Section: Features For Hmm2mentioning

confidence: 99%

Increasing speech recognition robustness with HMM2

Weber

Bengio

Bourlard

2002

IEEE International Conference on Acoustics Speech and Signal Processing

View full text Add to dashboard Cite

The purpose of this paper is to investigate the behavior of HMM2 models for the recognition of noisy speech. It has previously been shown that HMM2 is able to model dynamically important structural information inherent in the speech signal, often corresponding to formant positions/tracks. As formant regions are known to be robust in adverse conditions, HMM2 seems particularly promising for improving speech recognition robustness. Here, we review different variants of the HMM2 approach with respect to their application to noise-robust automatic speech recognition. It is shown that HMM2 has the potential to tackle the problem of mismatch between training and testing conditions, and that a multi-stream combination of (already noise-robust) cepstral features and formant-like features (extracted by HMM2) improves the noise robustness of a state-of-the-art automatic speech recognition system.

show abstract

“…It has been shown that this frequency information improves discrimination between the different phonemes (Weber et al, 2001c). However, the impact of the frequency coefficient is different depending on whether it is treated (1) as an additional feature component (feature combination) or (2) as a second feature stream (likelihood combination).…”

Section: A Hmm2 Design Optionsmentioning

confidence: 99%

“…The subvectors are typically low-dimensional feature vectors, consisting of, for example, a coefficient, its first and second order time derivatives and an additional frequency index (Weber et al, 2001c). If such a temporal feature vector is to be emitted by a specific temporal HMM state, the associated sequence of frequency sub-vectors is emitted by the secondary HMM associated with the corresponding temporal HMM state.…”

Section: The Hmm2 Feature Extractormentioning

confidence: 99%

Evaluation of formant-like features on an automatic vowel classification task

Wet

Weber

Boves

et al. 2004

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

This study investigates possibilities to find a low-dimensional, formant-related physical representation of speech signals, which is suitable for automatic speech recognition. This aim is motivated by the fact that formants are known to be discriminant features for speech recognition. Combinations of automatically extracted formant-like features and state-of-the-art, noise-robust features have previously been shown to be more robust in adverse conditions than state-of-the-art features alone. However, it is not clear 1 de Wet, JASA how these automatically extracted formant-like features behave in comparison with true formants. The purpose of this paper is to investigate two methods to automatically extract formant-like features, i.e. robust formants and HMM2 features, and to compare these features to hand-labeled formants as well as to mel-frequency cepstral coefficients in terms of their performance on a vowel classification task. The speech data and hand-labeled formants that were used in this study are a subset of the American English vowels database presented in [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Classification performance was measured on the original, clean data as well as in (simulated) adverse conditions. In combination with standard automatic speech recognition methods, the classification performance of the robust formant and HMM2 features compare very well to the performance of the hand-labeled formants.PACS numbers: 43.72.Ne, 43.72.Ar 2 de Wet, JASA I IntroductionHuman speech signals can be described in many different ways (Flanagan, 1972;Rabiner and Schafer, 1978). Some descriptions are directly related to speech production, while others are more suitable for investigating speech perception. Some descriptive frameworks, of which the formant representation is a well-known example, have successfully been applied to both production and perception.Speech production is often modeled as an acoustic source feeding into a linear filter (representing the vocal tract) with little or no interaction between the source and the filter. In terms of this model of acoustic speech production, the phonetically relevant properties of speech signals can be characterized by the resonance frequencies of the filter (to be completed with information on the source, in terms of periodicity and power). It is well known that the frequencies of the first two or three formants are sufficient information for the perceptual identification of vowels (Flanagan, 1972;Minifie et al., 1973). The formant representation is attractive because of its parsimonious character: it allows the representation of speech signals with a very small number of parameters. Not surprisingly, many attempts have been made to exploit the parametric formant representation in speech technology applications such as speech synthesis, speech coding and automatic speech recognition (ASR).A special reason why formants make for an attractive representation of the acoustic characteristics of speech signals is their relation -by virt...

show abstract

Speech recognition using advanced HMM2 features

Cited by 9 publications

References 9 publications

Combining distributional and morphological information for part of speech induction

Combining distributional and morphological information for part of speech induction

Increasing speech recognition robustness with HMM2

Evaluation of formant-like features on an automatic vowel classification task

Contact Info

Product

Resources

About