[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing 1991
DOI: 10.1109/icassp.1991.150377
|View full text |Cite
|
Sign up to set email alerts
|

On the phonetic structure of a large hidden Markov model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

1992
1992
2015
2015

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 4 publications
1
3
0
Order By: Relevance
“…The results we obtained can be compared with other recent reports [5], [14], [15]. Bocchieri and Wilpon [13] employed 38-dimensional delta-delta cepstrum (DDCEP) feature vectors and obtained 98.6% recognition accuracy for the TI connected digits and 80.8% for the E-set letters.…”
Section: Resultssupporting
confidence: 62%
See 1 more Smart Citation
“…The results we obtained can be compared with other recent reports [5], [14], [15]. Bocchieri and Wilpon [13] employed 38-dimensional delta-delta cepstrum (DDCEP) feature vectors and obtained 98.6% recognition accuracy for the TI connected digits and 80.8% for the E-set letters.…”
Section: Resultssupporting
confidence: 62%
“…De Haan and Ececioglu reported 97.2% recognition accuracy for the TI isolated digits by using a 19-dimensional feature vector and a feature map strategy [5]. Another interesting result is 50.6% recognition accuracy for TIMIT phoneme recognition reported by Pepper and Clements, who employed 25-dimensional feature vectors and a large 128-state HMM [14].…”
Section: Resultsmentioning
confidence: 93%
“…While no direct comparison with the previously established HMM segment vocoders of [T98] or [MTK98] is done, this method shows that an ergodic HMM with number of states as 128 yields a good overall quality and intelligibility with speaker characteristics preserved at an effective bit-rate of 128 bps, though no formal listening tests are done. The notions of the optimal number of states being 64 for American English to correspond to the phones in this work is based on earlier work [Pepp90,FC86,Pepp91] and is closely in corroboration with a related work on using ergodic-HMMs to model spoken languages for language-identification [RSS03, SR05].…”
Section: Ergodic Hmm Frameworkmentioning
confidence: 55%
“…For an effective speech coder, the size of the EHMM must be considerably larger. Previous work [4][5][6] has shown that at least 64 states are required to represent all the acoustic variations present in fluent North American English. A single EHMM of this size can then be trained to model all of the sounds in such speech as well as the time dependent transitions from one sound to another.…”
Section: Ergodic Hidden Markov Modelsmentioning
confidence: 99%