In this chapter, we introduce techniques that fuse deep neural networks (DNNs) and Gaussian mixture models (GMMs). We first describe the Tandem and bottleneck approach in which DNNs are used as feature extractors. The hidden layers, which are better representation than the raw input feature, are used as features in the GMM systems. We then introduce techniques that fuse the recognition results and frame-level scores of the DNN-HMM hybrid system with that of the GMM-HMM system.
Use DNN-Derived Features in GMM-HMM SystemsIn Chap. 9, we have shown that in the deep neural network (DNN)-hidden Markov model (HMM) hybrid systems DNNs jointly learn the nonlinear feature transformation and the log-linear classifier. More importantly, the feature representation learned by DNNs is more robust to the speaker and environment variations than the original feature. A natural idea is to treat the hidden and output layers in DNNs as better features and use them in the conventional GMM-HMM systems.