2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178776
|View full text |Cite
|
Sign up to set email alerts
|

A Gaussian Mixture Model layer jointly optimized with discriminative features within a Deep Neural Network architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(39 citation statements)
references
References 11 publications
0
39
0
Order By: Relevance
“…As reported [13], the soft-max layer in CNN is equivalent to a single Gaussian model with a globally pooled covariance matrix. The work [34] applied a joint optimization strategy of feature extraction and classification in the task of automatic speech recognition. However, to the best of our knowledge, the joint optimization of CNN and GMM has not been studied in the area of unsupervised clustering.…”
Section: Related Workmentioning
confidence: 99%
“…As reported [13], the soft-max layer in CNN is equivalent to a single Gaussian model with a globally pooled covariance matrix. The work [34] applied a joint optimization strategy of feature extraction and classification in the task of automatic speech recognition. However, to the best of our knowledge, the joint optimization of CNN and GMM has not been studied in the area of unsupervised clustering.…”
Section: Related Workmentioning
confidence: 99%
“…We find the hierarchical MoE models to be particularly interesting moving forward as they present a direction for construction of deep generative NNs. Works in this direction include van den Oord and Schrauwen (), Theis and Bethge (), and Variani, McDermott, and Heigold ().…”
Section: Mixture‐of‐experts Modelingmentioning
confidence: 99%
“…In contrast to Hybrid systems whose parameters are all simultaneously trained, Tandem systems often have the GMMs estimated using a pre-trained BN DNN. This issue can be addressed by jointly training BN DNN and GMMs based on either the cross entropy (CE) [4,5] or the minimum phone error (MPE) criteria [6]. These jointly trained speaker independent (SI) Tandem systems yield similar word error rates (WERs) to SI Hybrid systems.…”
Section: Introductionmentioning
confidence: 99%