A robust speaker-independent isolated word HMM recognizer for operation over the telephone network

Song, Jinbao; Samouelian, A.

doi:10.1016/0167-6393(93)90027-i

Cited by 5 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…. From the missing feature theory [31], [32], the probability we want to compute is (10) For the missing observation vector, , we know (11) By substituting (7) and (12) int (10), we finally obtain (12) The transition probabilities have less effect in the Viterbi search than the observation probabilities [33]. Therefore, we can set .…”

Section: B Deletion Of Erased Framesmentioning

confidence: 99%

A bitstream-based front-end for wireless speech recognition on IS-136 communications system

Kim

Cox

2001

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

In this paper, we propose a feature extraction method for a speech recognizer that operates in digital communication networks. The feature parameters are basically extracted by converting the quantized spectral information of a speech coder into a cepstrum. We also include the voiced/unvoiced information obtained from the bitstream of the speech coder in the recognition feature set. We performed speaker-independent connected digit HMM recognition experiments under clean, background noise, and channel impairment conditions. From these results, we found that the speech recognition system employing the proposed bitstream-based front-end gives superior word and string accuracies over a recognizer constructed from decoded speech signals. Its performance is comparable to that of a wireline recognition system that uses the cepstrum as a feature set. Next, we extended the evaluation of the proposed bitstream-based front-end to large vocabulary speech recognition with a name database. The recognition results proved that the proposed bitstream-based front-end also gives a comparable performance to the conventional wireline front-end.

show abstract

Section: B Deletion Of Erased Framesmentioning

confidence: 99%

A bitstream-based front-end for wireless speech recognition on IS-136 communications system

Kim

Cox

2001

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

show abstract

“…) (Song & Samouelian, 1993). The signal was then processed by a 256-point (16 ms) Hamming window with a frame shift of 80 points (5 ms).…”

Section: Mel Frequency Cepstral Coefficientsmentioning

confidence: 99%

Frame-level phoneme classification using inductive inference

Samouelian

1997

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

“…A conventional HMM recognition system is based on the one described in [3]. with 5 left-to-right states and 6 mixture compo nents per state being used to get a baseline performance.…”

Section: Recognition Phasementioning

confidence: 99%

“…The reason for this is because transitional probabilities of left-to-right HMMs play an insignifi cant role due to their dynamic numerical range compared with local likelihood, thereby they can be ignored altogether without any nota ble effect on recognition accuracy[3].This simple sum-up through the frame-to-frame and state-to-state likelihood propagation in the Viterbi algorithm possesses a very interesting characteristic, i.e. the matching process can be viewed as a sophisticated model dependent transformation, which transforms each acoustic vector XI to a scalar quantity logb , .�i (x t ) .…”

mentioning

confidence: 98%

Enhancement of discriminative capabilities of HMM based recognizer through modification of Viterbi algorithm

Song

1995 International Conference on Acoustics, Speech, and Signal Processing

View full text Add to dashboard Cite

The algorithm proposed in this paper integrates the concepts of variable frame rate and discriminative analysis bascd on Tanimoto ratio to modify the conventional Viterbi algorithm, in such a way that the steady or stationary signal is compressed, while transitional or non-stationary �,ignal is emphasized through the frame-by-frame searching process. The usefulness of each frame is decided entirely within the Viterbi process and needs not to be the samc for different models. To evaluate this algori thm , we tested a speech database of 9 highly confusable E-set English letters. With 5 state and 6 mixture components, the conventional HMM baseline system only delivered the recognition accuracy of 73.9%. In the preliminary experiment using the algorithm proposed in this paper, the recognition accuracy was increased to 82.5%.

show abstract

A robust speaker-independent isolated word HMM recognizer for operation over the telephone network

Cited by 5 publications

References 6 publications

A bitstream-based front-end for wireless speech recognition on IS-136 communications system

A bitstream-based front-end for wireless speech recognition on IS-136 communications system

Frame-level phoneme classification using inductive inference

Enhancement of discriminative capabilities of HMM based recognizer through modification of Viterbi algorithm

Contact Info

Product

Resources

About