Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181
DOI: 10.1109/icassp.1998.675349
|View full text |Cite
|
Sign up to set email alerts
|

Using aggregation to improve the performance of mixture Gaussian acoustic models

Abstract: This paper investigates the use of aggregation as a means of improving the performance and robustness of mixture Gaussian models. This technique produces models that are more accurate and more robust to different test sets than traditional cross-validation using a development set. A theoretical justification for this technique is presented along with experimental results in phonetic classification, phonetic recognition, and word recognition tasks on the TIMIT and Resource Management corpora. In speech classifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 8 publications
0
5
0
Order By: Relevance
“…Since the EM algorithm is only guaranteed to converge to a local maximum, the final model parameters are highly dependent on the initial conditions obtained from the K-means clustering. To improve the performance and robustness of the mixture models, we used a technique called aggregation (Hazen and Halberstadt 1998), which is described in Section 4.2.…”
Section: Speech Recognition Systemmentioning
confidence: 99%
“…Since the EM algorithm is only guaranteed to converge to a local maximum, the final model parameters are highly dependent on the initial conditions obtained from the K-means clustering. To improve the performance and robustness of the mixture models, we used a technique called aggregation (Hazen and Halberstadt 1998), which is described in Section 4.2.…”
Section: Speech Recognition Systemmentioning
confidence: 99%
“…3) A k-means procedure is applied to cluster Gaussian mixture components into each node. In each iteration, when KL divergence is used, the mean and variance can be updated either by the ML approach [(10), (11)] or by the KL approach [(14), (15)]. Similarly, the ML or BH approach [(16), (17)] can be applied when Bhattacharyya distance is chosen.…”
Section: Tree Constructionmentioning
confidence: 99%
“…In [9] and [10], approaches based on tree-structured Gaussian densities were proposed to achieve computational efficiency in speech recognition. A tree structure with bottom-up clustering was also proposed in [11] for purpose of pruning the aggregated Gaussian models. In [12], a decision-tree technique was proposed to partition the feature space hierarchically.…”
mentioning
confidence: 99%
“…A 5 consistently outperforms measurements A 1 -A 4 . Compared to the baseline, A 5 exhibits improvement that is statistically sig- We have also implemented 4-fold model aggregation [16] for A 5 obtaining 22.9% error rate on the Core Test set. We then combined this classifier with 8 other classifiers defined over 8 segmental features described in [14] obtaining an error rate of 18.5% on the same set, which is an improvement over the 18.7% obtained without the waveletbased feature.…”
Section: Resultsmentioning
confidence: 99%