“…A deep belief networkdeep neural network (DBN-DNN) with four hidden layers having ten frames of input temporal context and a sigmoid nonlinearity is discriminatively trained using the training data and a tri-gram language model is used in the ASR decoding. We compare the ASR performance of the proposed modulation filtering approach with traditional mel filter bank energy (MFB) features, power normalized filter bank energy (PFB) features (Kim and Stern, 2012), advanced ETSI front-end (ETS) (ETSI, 2002), RASTA features (RAS) (Hermansky and Morgan, 1994), LDA based features (Van Vuuren and Hermansky, 1997), spectro-temporal Gabor filters with filter selection based features (GAB) (Kovacs et al, 2015), MHEC features (MHE) (Sadjadi and Hansen, 2015), and auditory spectrogram features (ASp) (Chi et al, 2005). The results for the proposed data-driven modulation filtering obtained from MFB and ASp are also shown here.…”