Interspeech 2013 2013
DOI: 10.21437/interspeech.2013-48
|View full text |Cite
|
Sign up to set email alerts
|

Improved feature processing for deep neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 113 publications
(19 citation statements)
references
References 11 publications
0
19
0
Order By: Relevance
“…We choose the Kaldi Toolkit [29] as ASR back-end system to evaluate the DNN-HMM hybrid system on the 8-channel RE-VERB Challenge task [30] (WSJ0 trigram 5k language model, circular microphone array with a microphone spacing of 8 cm). As first step, a GMM-HMM system is trained on the clean WSJCAM0 Cambridge Read News REVERB corpus [31] with feature extraction following the Type-I creation in [32], which is state-of-the art in the Kaldi recipe [29]. Then, we create a stateframe alignment to train the DNN on the multi-condition training sets (each of 7861 utterances) provided by the REVERB challenge [30].…”
Section: Methodsmentioning
confidence: 99%
“…We choose the Kaldi Toolkit [29] as ASR back-end system to evaluate the DNN-HMM hybrid system on the 8-channel RE-VERB Challenge task [30] (WSJ0 trigram 5k language model, circular microphone array with a microphone spacing of 8 cm). As first step, a GMM-HMM system is trained on the clean WSJCAM0 Cambridge Read News REVERB corpus [31] with feature extraction following the Type-I creation in [32], which is state-of-the art in the Kaldi recipe [29]. Then, we create a stateframe alignment to train the DNN on the multi-condition training sets (each of 7861 utterances) provided by the REVERB challenge [30].…”
Section: Methodsmentioning
confidence: 99%
“…After monophone and triphone training, Mel Frequency Cepstral Coefficients (MFCCs) are processed with Linear Discriminant Analysis (LDA) and a Maximum Likelihood Linear Transform (MLLT). This is followed by Speaker Adaptive Training (SAT) with feature-space MLLR (fMLLR) [27,28]. This HMM-GMM system is denoted Baseline in Table 2.…”
Section: Acoustic Model Training and Evaluationmentioning
confidence: 99%
“…After monophone and triphone training, input features are processed with Linear Discriminant Analysis (LDA) and a Maximum Likelihood Linear Transform (MLLT). This is followed by Speaker Adaptive Training (SAT) with feature-space MLLR (fMLLR [27]). In the speaker-dependent scenario, each recording session is treated as a separate speaker for SAT.…”
Section: Systems and Resultsmentioning
confidence: 99%