Mazin G. Rahim scite author profile

Utterance verification represents an important technology in the design of user-friendly speech recognition systems. It involves the recognition of keyword strings and the rejection of nonkeyword strings. This paper describes a hidden Markov model-based (HMM-based) utterance verification system using the framework of statistical hypothesis testing. The two major issues on how to design keyword and string scoring criteria are addressed. For keyword verification, different alternative hypotheses are proposed based on the scores of antikeyword models and a general acoustic filler model. For string verification, different measures are proposed with the objective of detecting nonvocabulary word strings and possibly erroneous strings (so-called putative errors). This paper also motivates the need for discriminative hypothesis testing in verification. One such approach based on minimum classification error training is investigated in details. When the proposed verification technique was integrated into a state-of-the-art connected digit recognition system, the string error rate for valid digit strings was found to decrease by 57% when setting the rejection rate to 5%. Furthermore, the system was able to correctly reject over 99.9% of nonvocabulary word strings.

show abstract

Maximum likelihood and minimum classification error factor analysis for automatic speech recognition

Saul

Rahim

2000

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

Abstract-Hidden Markov models (HMM's) for automatic speech recognition rely on high-dimensional feature vectors to summarize the short-time properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: 1) to maximize the likelihood of observed speech signals, or 2) to minimize the number of classification errors. We derive an expectation-maximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one small-sized vocabulary (connected alpha-digits) and one medium-sized vocabulary (New Jersey town names). We find that modeling feature correlations by factor analysis leads to significantly increased likelihoods and word accuracies. Moreover, the rate of improvement with model size often exceeds that observed in conventional HMM's.

show abstract

Signal conditioning techniques for robust speech recognition

Rahim

Juang

Chou

et al. 1996

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Boosting with prior knowledge for call classification

Schapire

Rochery

Rahim

et al. 2005

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training

Sukkar

Setlur

Rahim

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mazin G. Rahim

Discriminative utterance verification for connected digits recognition

Maximum likelihood and minimum classification error factor analysis for automatic speech recognition

Signal conditioning techniques for robust speech recognition

Boosting with prior knowledge for call classification

Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training

Contact Info

Product

Resources

About