Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181
DOI: 10.1109/icassp.1998.675386
|View full text |Cite
|
Sign up to set email alerts
|

Decision tree state tying based on segmental clustering for acoustic modeling

Abstract: In this paper, a fast segmental clustering approach to decision tree tying based acoustic modeling is proposed for large vocabulary speech recognition. It is based on a two level clustering scheme for robust decision tree state clustering. This approach extends the conventional segmental K-means approach to phonetic decision tree state tying based acoustic modeling. It achieves high recognition performances while reducing the model training time from days to hours comparing to the approaches based on Baum-Welc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 31 publications
(13 citation statements)
references
References 9 publications
0
13
0
Order By: Relevance
“…The 61-dimensional feature vectors (I energy coeff + 60 KL-trans formed features) are used as features to build triphone HMM mod els] on the WSJ SI-84 training set using the decision-tree state tying algorithm described in [12]. We point out that the first and second order derivative of this feature set are not used.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…The 61-dimensional feature vectors (I energy coeff + 60 KL-trans formed features) are used as features to build triphone HMM mod els] on the WSJ SI-84 training set using the decision-tree state tying algorithm described in [12]. We point out that the first and second order derivative of this feature set are not used.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…The speech recognizer is based on continuous density HMMs and the Bell Labs recognition engine [27]. The acoustic units are state-clustered triphone models, having three emitting states and a left-to-right topology [24].…”
Section: Resultsmentioning
confidence: 99%
“…Usually a mixture of Gaussian densities is used for each state of the phoneme (or triphone) specific HMM [24]. In this case, the reduced distribution for the reliable part of the feature vector is the marginal determined by integrating over all unreliable components:…”
Section: Soft-feature Error Concealmentmentioning
confidence: 99%
“…In particular, we have proposed a new segmental two-level clustering algorithm that combines the phonetic decision-tree-based state-tying with agglomerative clustering to improve model coverage on rarely seen acoustic phonetic events in the training data [31,3,33]. We have also devised a unified maximum likelihood framework to incorporate generalized phonetic and non-phonetic features in decision-treebased state-tying.…”
Section: Acoustic Modelingmentioning
confidence: 99%