A CASA-Based System for Long-Term SNR Estimation

Narayanan, Arun; Wang, DeLiang

doi:10.1109/tasl.2012.2205242

Cited by 30 publications

(14 citation statements)

References 27 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is because classifiers are trained to distinguish between speech-and noise-dominant T-F units, and the acoustic characteristics of speechdominant units are generally different from those of noisedominant units, even when that noise is babble. We consider SNR mismatch to be of less concern than noise mismatch because SNR estimation can be performed with reasonable accuracy (Kim and Stern, 2008;Narayanan and Wang, 2012). Regarding noise mismatch, recent effort has been made to address this issue.…”

Section: Discussionmentioning

confidence: 99%

An algorithm to improve speech recognition in noise for hearing-impaired listeners

Healy

Yoho

Wang

et al. 2013

The Journal of the Acoustical Society of America

Self Cite

200

180

View full text Add to dashboard Cite

Despite considerable effort, monaural (single-microphone) algorithms capable of increasing the intelligibility of speech in noise have remained elusive. Successful development of such an algorithm is especially important for hearing-impaired (HI) listeners, given their particular difficulty in noisy backgrounds. In the current study, an algorithm based on binary masking was developed to separate speech from noise. Unlike the ideal binary mask, which requires prior knowledge of the premixed signals, the masks used to segregate speech from noise in the current study were estimated by training the algorithm on speech not used during testing. Sentences were mixed with speech-shaped noise and with babble at various signal-to-noise ratios (SNRs). Testing using normal-hearing and HI listeners indicated that intelligibility increased following processing in all conditions. These increases were larger for HI listeners, for the modulated background, and for the least-favorable SNRs. They were also often substantial, allowing several HI listeners to improve intelligibility from scores near zero to values above 70%.

show abstract

Section: Discussionmentioning

confidence: 99%

An algorithm to improve speech recognition in noise for hearing-impaired listeners

Healy

Yoho

Wang

et al. 2013

The Journal of the Acoustical Society of America

Self Cite

200

180

View full text Add to dashboard Cite

show abstract

“…The bottom-up mask is estimated using a recently proposed system described in [31], which combines masks estimated by CASA based [19] and speech enhancement based methods [32]. The speech enhancement based mask uses an LC of -5 dB.…”

Section: Methodsmentioning

confidence: 99%

Coupling binary masking and robust ASR

Narayanan

Wang

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

We present a novel framework for performing speech separation and robust automatic speech recognition (ASR) in a unified fashion. Separation is performed by estimating the ideal binary mask (IBM), which identifies speech dominant and noise dominant units in a time-frequency (T-F) representation of the noisy signal. ASR is performed on extracted cepstral features after binary masking. Previous systems perform these steps in a sequential fashion -separation followed by recognition. The proposed framework, which we call bidirectional speech decoding (BSD), unifies these two stages. It does this by using multiple IBM estimators each of which is designed specifically for a back-end acoustic phonetic unit (BPU) of the recognizer. The standard ASR decoder is modified to use these IBM estimators to obtain BPU-specific cepstra during likelihood calculation. On the Aurora-4 robust ASR task, the proposed framework obtains a relative improvement of 17% in word error rate over the noisy baseline. It also obtains significant improvements in the quality of the estimated IBM.

show abstract

“…After subtracting the noise spectrum from the input signal to obtain the clean signal, SNR is estimated. In [13], computational auditory scene analysis is used to estimate speech dominated and arXiv:1804.04353v1 [eess.AS] 12 Apr 2018 noise dominated portions of the signal in order to obtain SNR.…”

Section: Related Workmentioning

confidence: 99%

Global SNR Estimation of Speech Signals Using Entropy and Uncertainty Estimates from Dropout Networks

Aralikatti¹,

Margam²,

Sharma³

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

This paper demonstrates two novel methods to estimate the global SNR of speech signals. In both methods, Deep Neural Network-Hidden Markov Model (DNN-HMM) acoustic model used in speech recognition systems is leveraged for the additional task of SNR estimation. In the first method, the entropy of the DNN-HMM output is computed. Recent work on bayesian deep learning has shown that a DNN-HMM trained with dropout can be used to estimate model uncertainty by approximating it as a deep Gaussian process. In the second method, this approximation is used to obtain model uncertainty estimates. Noise specific regressors are used to predict the SNR from the entropy and model uncertainty. The DNN-HMM is trained on GRID corpus and tested on different noise profiles from the DEMAND noise database at SNR levels ranging from -10 dB to 30 dB.

show abstract

A CASA-Based System for Long-Term SNR Estimation

Cited by 30 publications

References 27 publications

An algorithm to improve speech recognition in noise for hearing-impaired listeners

An algorithm to improve speech recognition in noise for hearing-impaired listeners

Coupling binary masking and robust ASR

Global SNR Estimation of Speech Signals Using Entropy and Uncertainty Estimates from Dropout Networks

Contact Info

Product

Resources

About