A comparative large scale study of MLP features for Mandarin ASR

Valente, Fabio; Magimai-Doss, Mathew; Plahl, Christian; Ravuri, Suman; Wang, Wen

doi:10.21437/interspeech.2010-383

Cited by 17 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The cutoff frequency for both filter-banks is approximatively 10Hz. The output of the MRASTA filtering is then processed according to a hierarchy of MLPs progressively moving from high to low modulation frequencies or equivalently from short to long temporal context [7]. The effect of this sequential processing is that the first MLP trained on short temporal context is effective on most of the phonetic classes apart stops and affricatives.…”

Section: Mlp Architecturesmentioning

confidence: 99%

“…Because of the large dimension of these time windows, a number of techniques for efficiently encoding the information have been proposed like MRASTA [3], DCT-TRAPS [4], and wLP-TRAPS [5]. The second direction includes a number of heterogeneous techniques that aim at overcoming the pitfalls of the three-layer MLP classifier, including bottleneck architectures [6], hierarchical architectures [7], and multi-stream approaches [8].…”

Section: Introductionmentioning

confidence: 99%

“…In our previous related work [7], we investigated a subset of these techniques, namely, the MRASTA processing and its hierarchical version in a Mandarin broadcast LVCSR system developed in the framework of the GALE project 1 . This paper aims at complementing that study including other MLP input features (DCT-TRAPS and wLP-TRAPS) as well as Bottleneck architectures in order to cover all the front-ends that have been proposed and integrated into LVCSR systems.…”

Section: Introductionmentioning

confidence: 99%

“…The study is carried on the same Mandarin Broadcast system described in [7] and we examined the MLP feature performances as stand-alone front ends and in concatenation with spectral features (MFCC). The remainder of this work is organized as follows.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Analysis and comparison of recent MLP features for LVCSR systems

Valente¹,

Magimai.-Doss²,

Wang³

2011

Interspeech 2011

Self Cite

View full text Add to dashboard Cite

Section: Mlp Architecturesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Analysis and comparison of recent MLP features for LVCSR systems

Valente¹,

Magimai.-Doss²,

Wang³

2011

Interspeech 2011

Self Cite

View full text Add to dashboard Cite

“…Since in the state-of-the-art ASR systems [9,10] MLPs are mainly used in TANDEM approach as features, in this paper our main goal is to compare the two different acoustic modeling techniques not only on MFCC, but also on concatenated cepstral and posterior features. Therefore, evaluating several feature transformation techniques (SAT, LDA) developed for GMM, the study is also extended to context-dependent MLP features.…”

Section: Introductionmentioning

confidence: 99%

Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?

et al. 2012

View full text Add to dashboard Cite

Fuse Deep Neural Network and Gaussian Mixture Model Systems

Deng

2014

Automatic Speech Recognition

View full text Add to dashboard Cite

In this chapter, we introduce techniques that fuse deep neural networks (DNNs) and Gaussian mixture models (GMMs). We first describe the Tandem and bottleneck approach in which DNNs are used as feature extractors. The hidden layers, which are better representation than the raw input feature, are used as features in the GMM systems. We then introduce techniques that fuse the recognition results and frame-level scores of the DNN-HMM hybrid system with that of the GMM-HMM system. Use DNN-Derived Features in GMM-HMM SystemsIn Chap. 9, we have shown that in the deep neural network (DNN)-hidden Markov model (HMM) hybrid systems DNNs jointly learn the nonlinear feature transformation and the log-linear classifier. More importantly, the feature representation learned by DNNs is more robust to the speaker and environment variations than the original feature. A natural idea is to treat the hidden and output layers in DNNs as better features and use them in the conventional GMM-HMM systems.

show abstract

A comparative large scale study of MLP features for Mandarin ASR

Cited by 17 publications

References 17 publications

Analysis and comparison of recent MLP features for LVCSR systems

Analysis and comparison of recent MLP features for LVCSR systems

Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?

Fuse Deep Neural Network and Gaussian Mixture Model Systems

Contact Info

Product

Resources

About