Effectiveness of discriminative training and feature transformation for reverberated and noisy speech

Tachioka, Yuuki; Watanabe, Shinji; Hershey, John R.

doi:10.1109/icassp.2013.6639006

Cited by 14 publications

(12 citation statements)

References 25 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Large improvements can be obtained even with a clean recognizer back-end; furthermore, in unseen acoustic conditions the data-based method achieves notable performance compared to the model-based method. Future work will concentrate on the integration of discriminative methods both in the ASR back-end training and in the DAE training, which have proven effective for reverberated speech [28]. Furthermore, for better integration with the ASR back-end, we will investigate improved cost functions in DAE training taking account parameters of the ASR back-end instead of just optimizing distances in the spectral domain.…”

Section: Discussionmentioning

confidence: 97%

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

Weninger

Watanabe

Tachioka

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 % and 36.6 % on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 %).

show abstract

Section: Discussionmentioning

confidence: 97%

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

Weninger

Watanabe

Tachioka

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Yuuki Tachioka et al [12] verified the effectiveness of discriminate training under clean, reverberant and noisy speech using MFCC features, MFCC+LDA+MLLT features and PLP features separately. The feature transformation are effective on the non-stationary interference and reverberation.…”

Section: Kaldimentioning

confidence: 99%

A study on automatic speech recognition toolkits

Sahu

Dontamsetti

2015

2015 International Conference on Microwave, Optical and Communication Engineering (ICMOCE)

View full text Add to dashboard Cite

The applications of modern speech recognition are becoming more common with the demand of human-machine interactions. Many speech based interactive software applications were executed on the classical general purpose computers. This paper reports an overview about the different speech recognition systems and also about the different speech recognition tools such as HTK, CMU Sphinx, Kaldi and performance metrics of the toolkits.

show abstract

“…In [13], a model of the noise is estimated from observed data by considering the late reverberation as additive noise, and then the feature vector is enhanced by applying vector Taylor series. A feature transformation based on discriminative training criterion inspired on Maximum Mutual Information is suggested in [14]. Additional features related to the amount of diffuse noise in each frequency bin and frame are employed in [15] to improve deep neural network-based ASR accuracy in noisy and reverberant environments.…”

Section: Distant-talking Asrmentioning

confidence: 99%

Reverberant speech recognition exploiting clarity index estimation

Parada

Sharma

Naylor

et al. 2015

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C 50 ). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.

show abstract

Effectiveness of discriminative training and feature transformation for reverberated and noisy speech

Cited by 14 publications

References 25 publications

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

A study on automatic speech recognition toolkits

Reverberant speech recognition exploiting clarity index estimation

Contact Info

Product

Resources

About