Yangyang Xia scite author profile

This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. The proposed loss functions are evaluated by widely accepted objective quality and intelligibility measures and compared to other competitive online methods. In addition, we study the impact of feature normalization and varying batch sequence lengths on the objective quality of enhanced speech. Finally, we show subjective ratings for the proposed approach and a state-of-the-art real-time RNN-based method.

show abstract

A greedy traffic light and queue aware routing protocol for urban VANETs

Xia

Qin

Liu

et al. 2018

China Commun.

View full text Add to dashboard Cite

Domain-Specific Suppression for Adaptive Object Detection

Wang

Zhang

Zhang³

et al. 2021

View full text Add to dashboard Cite

Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

Xia

Braun

Reddy

et al. 2020

Preprint

View full text Add to dashboard Cite

A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement

Xia

Stern

2018

View full text Add to dashboard Cite

Speech enhancement under highly non-stationary noise conditions remains a challenging problem. Classical methods typically attempt to identify a frequency-domain optimal gain function that suppresses noise in noisy speech. These algorithms typically produce artifacts such as "musical noise" that are detrimental to machine and human understanding, largely due to inaccurate estimation of noise power spectra. The optimal gain function is commonly referred to as the ideal ratio mask (IRM) in neural-network-based systems, and the goal becomes estimation of the IRM from the short-time Fourier transform amplitude of degraded speech. While these data-driven techniques are able to enhance speech quality with reduced artifacts, they are frequently not robust to types of noise that they had not been exposed to in the training process. In this paper, we propose a novel recurrent neural network (RNN) that bridges the gap between classical and neural-network-based methods. By reformulating the classical decision-directed approach, the a priori and a posteriori SNRs become latent variables in the RNN, from which the frequency-dependent estimated likelihood of speech presence is used to update recursively the latent variables. The proposed method provides substantial enhancement of speech quality and objective accuracy in machine interpretation of speech.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yangyang Xia

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

A greedy traffic light and queue aware routing protocol for urban VANETs

Domain-Specific Suppression for Adaptive Object Detection

Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement

Contact Info

Product

Resources

About