A Modulation-Domain Loss for Neural-Network-Based Real-Time Speech Enhancement

Vuong, Tyler; Xia, Yangyang; Stern, Richard M.

doi:10.1109/icassp39728.2021.9414965

Cited by 8 publications

(1 citation statement)

References 32 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The final output of the first step is Y^(LPS). And, the final output of the second step is < YR. Waveform reconstruction is a type of step in the enhancement stage [20]. The output from the RNN-LSTM training XLPS is fed into the Exp (), which corresponds to the process of exponential the input.…”

Section: ░ 3 Block Diagrammentioning

confidence: 99%

Speech Enhancement with Background Noise Suppression in Various Data Corpus Using Bi-LSTM Algorithm

2024

IJEER

View full text Add to dashboard Cite

Noise reduction is one of the crucial procedures in today’s teleconferencing scenarios. The signal-to-noise ratio (SNR) is a paramount factor considered for reducing the Bit error rate (BER). Minimizing the BER will result in the increase of SNR which improves the reliability and performance of the communication system. The microphone is the primary audio input device that captures the input signal, as the input signal is carried away it gets interfered with white noise and phase noise. Thus, the output signal is the combination of the input signal and reverberation noise. Our idea is to minimize the interfering noise thus improving the SNR. To achieve this, we develop a real-time speech-enhancing method that utilizes an enhanced recurrent neural network with Bidirectional Long Short Term Memory (Bi-LSTM). One LSTM in this sequence processing framework accepts the input in the forward direction, whereas the other LSTM takes it in the opposite direction, making up the Bi-LSTM. Considering Bi-LSTM, it takes fewer tensor operations which makes it quicker and more efficient. The Bi-LSTM is trained in real-time using various noise signals. The trained system is utilized to provide an unaltered signal by reducing the noise signal, thus making the proposed system comparable to other noise-suppressing systems. The STOI and PESQ metrics demonstrate a rise of approximately 0.5% to 14.8% and 1.77% to 29.8%, respectively, in contrast to the existing algorithms across various sound types and different input signal-to-noise ratio (SNR) levels.

show abstract

Section: ░ 3 Block Diagrammentioning

confidence: 99%