Signal modeling for isolated word recognition

Karnjanadecha, Montri; Zahorian, Stephen A.

doi:10.1109/icassp.1999.758120

Cited by 10 publications

(9 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The block features were recomputed every 10 ms. No manual segmentation or phonetic labeling was required or used. The primary modification, relative to [8] and [9], is that the block length was varied at both ends of each analyzed utterance, rather than only for the beginning section. See Fig.1 for an illustration of this variable block length method.…”

Section: ) Variable Block Length Methodsmentioning

confidence: 99%

Signal modeling for high-performance robust isolated word recognition

Karnjanadecha

Zahorian

2001

IEEE Trans. Speech Audio Process.

Self Cite

View full text Add to dashboard Cite

This paper describes speech signal modeling techniques which are well suited to high performance and robust isolated word recognition. We present new techniques for incorporating spectral/temporal information as a function of temporal position within each word. In particular, spectral/temporal parameters are computed using both variable length blocks with a variable spacing between blocks. We tested features computed with these methods using an alphabet recognition task based on the ISOLET database. The Hidden Markov Model Toolkit (HTK) was used to implement the isolated word recognizer with whole word HMM models. The best accuracy achieved for speaker independent alphabet recognition, using 50 features, was 97.9%, which represents a new benchmark for this task. We also tested these methods with deliberate signal degradation using additive Gaussian noise and telephone band limiting and found that the recognition degrades gracefully and to a smaller degree than for control cases based on MFCC coefficients and delta cepstra terms.

show abstract

Section: ) Variable Block Length Methodsmentioning

confidence: 99%

Signal modeling for high-performance robust isolated word recognition

Karnjanadecha

Zahorian

2001

IEEE Trans. Speech Audio Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…For both training and testing data, the modified Discrete Cosine Transformation Coefficients (DCTC) and Discrete Cosine Series Coefficients (DCSC) (Zahorian et al 1991;Zahorian et al, 1997;Zahorian et al, 2002;Karnjanadecha & Zahorian, 1999) were extracted as original features. The modified DCTC is used for representing speech spectra, and the modified DCSC is used to represent spectral trajectories.…”

Section: Dctc/dcsc Speech Featuresmentioning

confidence: 99%

Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition

Zahorian¹,

Hu²

2011

Speech Technologies

View full text Add to dashboard Cite

“…In conclusion, we can say that some of the disadvantage of phoneme based recognizers as in [1] when compared to word based recognizer is complexity of the system and the word transcription must be known [2]. Also, from our review, many of these testing and experiments were done using HMMs and modified techniques.…”

Section: Some Existing Workmentioning

confidence: 99%

“…Signal modeling for high performance and robust isolated word recognition were proposed in [2,8]. The authors proposed a new technique for incorporating temporal and spectral feature within each word.…”

Section: Some Existing Workmentioning

confidence: 99%

“…Isolated speech recognition system may also use spoken digits to test its recognition accuracy. Spoken alphabet recognition may have several applications among them, automated directory assistance to retrieve information such as spelling names, telephone numbers addresses and ZIP codes [1,2].Spoken alphabet recognition may be seen as a simple task for human beings but unfortunately, for machines this can be a challenging task due to high acoustic similarities among certain groups of letters [1][2][3]. High acoustic similarities may cause difficulty in classification while low acoustic similarities causes ease to discriminate among classes for speech recognition systems.An alphabet set which has been identified to be the most confusable for speech recognition is the so called E-set letters.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks

Adam¹,

Salam²

2012

IJCA

View full text Add to dashboard Cite

Spoken alphabet recognition as one of the subsets of speechrecognition and pattern recognition has many applications. Unfortunately, spoken alphabet recognition might not be a simple task due to highly confusable set of letters as presented in the English alphabets. The highly acoustic similarities that contribute to the confusability may hinder the accuracy of speech recognition systems. One of the confusable set is called the E-set letters which consist of the letters B, C, D, E, G, P, T, V and Z. In this study, we present aninvestigation of isolated alphabet speech recognition system using the Mel Frequency Cepstral Coefficients (MFCC) and Back-propagation Neural Network (BPNN) for the E-set and for all the 26 English alphabets. Learning rates and momentum rates of the BPNN are adjusted and varied in order to achieve the best recognition rate for the E-set and all the 26 alphabets. By adjusting these parameters,we managed to achieve 62.28% and 70.49% recognition rate for E-set recognition under speaker-independent and speaker-dependent conditions respectively.

show abstract

Signal modeling for isolated word recognition

Cited by 10 publications

References 4 publications

Signal modeling for high-performance robust isolated word recognition

Signal modeling for high-performance robust isolated word recognition

Nonlinear Dimensionality Reduction Methods for Use with Automatic Speech Recognition

Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks

Contact Info

Product

Resources

About