A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement

Xia, Yangyang; Stern, Richard M.

doi:10.21437/interspeech.2018-2423

Cited by 6 publications

(6 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…During experiments, we notice that even though our systems trained on MSE (e.g. row 4 in Table 1) could achieve similar objective measures compared to those trained on the proposed weighted losses (12), the corresponding subjective quality of systems trained on the weighted loss is a lot better. The most noticeable improvement of systems trained on our loss functions, especially with small α, is that the estimated gain function is much more frequency-selective than systems trained on regular MSE, resulting in higher noise suppression, especially at high SNRs.…”

Section: Methodsmentioning

confidence: 94%

“…In this work, we study real-time speech enhancement with recurrent neural network (RNN). Recent works involving RNNs demonstrated promising results [10], even at very low signal-to-noise ratio (SNR) scenarios [11,12].…”

Section: Introductionmentioning

confidence: 99%

“…Methods along this line of thought include learning multiple objectives from heterogeneous features [16,17,18], jointly optimizing the final goal and its sub-targets (e.g. speech-presence probability) [10,12], and directly optimizing towards an objective measure of speech quality or intelligibility [19,20]. The latter seems a promising way to improve objective quality, although both models have to incorporate the standard MSE due to the band limitation of each objective measure.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

Xia

Braun

Reddy

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. The proposed loss functions are evaluated by widely accepted objective quality and intelligibility measures and compared to other competitive online methods. In addition, we study the impact of feature normalization and varying batch sequence lengths on the objective quality of enhanced speech. Finally, we show subjective ratings for the proposed approach and a state-of-the-art real-time RNN-based method.

show abstract

Section: Methodsmentioning

confidence: 94%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

Xia

Braun

Reddy

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Unlike previous a priori SNR estimators, the proposed estimators do not require a noise estimator. Recently, a recurrent neural network (RNN) was used to aid the DD approach in a priori SNR estimation [15]. The proposed estimators differ by directly estimating the a priori SNR.…”

Section: Accepted Manuscriptmentioning

confidence: 99%

“…The proposed a priori SNR estimators significantly outperform the previous a priori SNR estimation methods. Evaluating the results in [15], the RNN-assisted DD approach (a deep learing-based a priori SNR estimator) could only outperform the DD approach at higher SNR levels (5 dB and greater for signal-to-distortion ratio (SDR)). Here, the ResLSTM and ResBLSTM a priori SNR estimators significantly outperform the DD approach for all conditions.…”

Section: Accepted Manuscriptmentioning

confidence: 99%

Deep learning for minimum mean-square error approaches to speech enhancement

Nicolson

Paliwal

2019

Speech Communication

113

124

View full text Add to dashboard Cite

Recently, the focus of speech enhancement research has shifted from minimum mean-square error (MMSE) approaches, like the MMSE short-time spectral amplitude (MMSE-STSA) estimator, to state-of-the-art masking-and mapping-based deep learning approaches. We aim to bridge the gap between these two differing speech enhancement approaches. Deep learning methods for MMSE approaches are investigated in this work, with the objective of producing intelligible enhanced speech at a high quality. Since the speech enhancement performance of an MMSE approach improves with the accuracy of the used a priori signal-to-noise ratio (SNR) estimator, a residual long short-term memory (ResLSTM) network is utilised here to accurately estimate the a priori SNR. MMSE approaches utilising the ResLSTM a priori SNR estimator are evaluated using subjective and objective measures of speech quality and intelligibility. The tested conditions include real-world non-stationary and coloured noise sources at multiple SNR levels. MMSE approaches utilising the proposed a priori SNR estimator are able to achieve higher enhanced speech quality and intelligibility scores than recent masking-and mapping-based deep learning approaches. The results presented in this work show that the performance of an MMSE approach to speech enhancement significantly increases when utilising deep learning.

show abstract

Efficient Speech Enhancement Using Recurrent Convolution Encoder and Decoder

Karthik

MazherIqbal

2021

Wireless Pers Commun

View full text Add to dashboard Cite

A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement

Cited by 6 publications

References 18 publications

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

Deep learning for minimum mean-square error approaches to speech enhancement

Efficient Speech Enhancement Using Recurrent Convolution Encoder and Decoder

Contact Info

Product

Resources

About