A Gaussian Mixture Model layer jointly optimized with discriminative features within a Deep Neural Network architecture

Variani, Ehsan; McDermott, Erik; Heigold, Georg

doi:10.1109/icassp.2015.7178776

Cited by 46 publications

(39 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As reported [13], the soft-max layer in CNN is equivalent to a single Gaussian model with a globally pooled covariance matrix. The work [34] applied a joint optimization strategy of feature extraction and classification in the task of automatic speech recognition. However, to the best of our knowledge, the joint optimization of CNN and GMM has not been studied in the area of unsupervised clustering.…”

Section: Related Workmentioning

confidence: 99%

An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling

Wang

Jiang

2019

Computer Vision – ACCV 2018

View full text Add to dashboard Cite

While supervised deep learning has achieved great success in a range of applications, relatively little work has studied the discovery of knowledge from unlabeled data. In this paper, we propose an unsupervised deep learning framework to provide a potential solution for the problem that existing deep learning techniques require large labeled data sets for completing the training process. Our proposed introduces a new principle of joint learning on both deep representations and GMM (Gaussian Mixture Model)-based deep modeling, and thus an integrated objective function is proposed to facilitate the principle. In comparison with the existing work in similar areas, our objective function has two learning targets, which are created to be jointly optimized to achieve the best possible unsupervised learning and knowledge discovery from unlabeled data sets. While maximizing the first target enables the GMM to achieve the best possible modeling of the data representations and each Gaussian component corresponds to a compact cluster, maximizing the second term will enhance the separability of the Gaussian components and hence the inter-cluster distances. As a result, the compactness of clusters is significantly enhanced by reducing the intra-cluster distances, and the separability is improved by increasing the inter-cluster distances. Extensive experimental results show that the propose method can improve the clustering performance compared with benchmark methods.

show abstract

Section: Related Workmentioning

confidence: 99%

An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling

Wang

Jiang

2019

Computer Vision – ACCV 2018

View full text Add to dashboard Cite

show abstract

“…We find the hierarchical MoE models to be particularly interesting moving forward as they present a direction for construction of deep generative NNs. Works in this direction include van den Oord and Schrauwen (), Theis and Bethge (), and Variani, McDermott, and Heigold ().…”

Section: Mixture‐of‐experts Modelingmentioning

confidence: 99%

Practical and theoretical aspects of mixture‐of‐experts modeling: An overview

Nguyen

Chamroukhi

2018

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Mixture-of-experts (MoE) models are a powerful paradigm for modeling data arising from complex data generating processes (DGPs). In this article, we demonstrate how different MoE models can be constructed to approximate the underlying DGPs of arbitrary types of data. Due to the probabilistic nature of MoE models, we propose the maximum quasi-likelihood (MQL) approach as a method for estimating MoE model parameters from data, and we provide conditions under which MQL estimators are consistent and asymptotically normal. The blockwise minorizationmaximization (blockwise-MM) algorithm framework is proposed as an all-purpose method for constructing algorithms for obtaining MQL estimators. An example derivation of a blockwise-MM algorithm is provided. We then present a method for constructing information criteria for estimating the number of components in MoE models and provide justification for the classic Bayesian information criterion (BIC). We explain how MoE models can be used to conduct classification, clustering, and regression and illustrate these applications via two worked examples.

show abstract

“…In contrast to Hybrid systems whose parameters are all simultaneously trained, Tandem systems often have the GMMs estimated using a pre-trained BN DNN. This issue can be addressed by jointly training BN DNN and GMMs based on either the cross entropy (CE) [4,5] or the minimum phone error (MPE) criteria [6]. These jointly trained speaker independent (SI) Tandem systems yield similar word error rates (WERs) to SI Hybrid systems.…”

Section: Introductionmentioning

confidence: 99%

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems

et al. 2018

View full text Add to dashboard Cite

Speaker independent (SI) Tandem systems trained by joint optimisation of bottleneck (BN) deep neural networks (DNNs) and Gaussian mixture models (GMMs) have been found to produce similar word error rates (WERs) to Hybrid DNN systems. A key advantage of using GMMs is that existing speaker adaptation methods, such as maximum likelihood linear regression (MLLR), can be used which to account for diverse speaker variations and improve system robustness. This paper investigates speaker adaptation and adaptive training (SAT) schemes for jointly optimised Tandem systems. Adaptation techniques investigated include constrained MLLR (CMLLR) transforms based on BN features for SAT as well as MLLR and parameterised sigmoid functions for unsupervised test-time adaptation. Experiments using English multi-genre broadcast (MGB3) data show that CMLLR SAT yields a 4% relative WER reduction over jointly trained Tandem and Hybrid SI systems, and further reductions in WER are obtained by system combination.

show abstract

A Gaussian Mixture Model layer jointly optimized with discriminative features within a Deep Neural Network architecture

Cited by 46 publications

References 11 publications

An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling

An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling

Practical and theoretical aspects of mixture‐of‐experts modeling: An overview

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems

Contact Info

Product

Resources

About