2019
DOI: 10.48550/arxiv.1901.00660
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks

Dayana Ribas,
Jorge Llombart,
Antonio Miguel
et al.

Abstract: This paper proposes a deep speech enhancement method which exploits the high potential of residual connections in a wide neural network architecture, a topology known as Wide Residual Network. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual mechanism extremely useful for the enhancement task since the signal … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 41 publications
(46 reference statements)
0
6
0
Order By: Relevance
“…In order to direct the RI2RI model to focus on the phase estimation task, it was pre-trained with a synthetic data constructed with clean magnitude and noisy phase. The pre-training of this model also utilizes (5).…”
Section: Training Objectivesmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to direct the RI2RI model to focus on the phase estimation task, it was pre-trained with a synthetic data constructed with clean magnitude and noisy phase. The pre-training of this model also utilizes (5).…”
Section: Training Objectivesmentioning
confidence: 99%
“…In recent years, deep neural network (DNN)-based models were utilized to deal with this challenge. The majority of these methods attempts to enhance the magnitude of the noisy and reveberant short time Fourier transform (STFT) [3][4][5]. In these approaches, the enhanced magnitude is combined with the noisy phase and then inverse-transformed to the timedomain.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, advances in technologies such as hearing aids require the speech systems to enhance perceptual quality of speech captured in adverse environmental conditions, thus improving human hearing abilities. Several deep learning (DL)-based speech enhancement systems have been successfully developed to address concurrent improvements in perceptual quality and performance of back-end speech and language applications using fully convolutional neural networks (FCN), and recurrent networks (RNN) [9,10,11,12]. The majority of these approaches work with the complex short-term fourier transform (STFT) of distorted speech, either to enhance the log-power spectrum (LPS) and reuse the unaltered distorted phase signal [13,14,15,16,17], or to estimate the complex ratio mask (cRM) [18,19,20] and directly enhance the complex spectrogram to restore a cleaner time-domain signal.…”
Section: Introductionmentioning
confidence: 99%
“…As deep neural networks (DNN) advance to be compatible with complex representations, researchers have investigated many speech enhancement strategies to estimate cRM using deep complex neural networks (DCNN). To address reverberation which distorts the signal in both time and frequency, many sequence-to-sequence learning strategies such as recurrent neural networks (RNNs) and long short-term memory (LSTM) [21,11] have also been explored. In addition to the FCNs, these methods capture and leverage the temporal correlations for speech dereverberation.…”
Section: Introductionmentioning
confidence: 99%
“…Frequency-domain features such as STFT, Gammatone spectrum and Mel-Frequency Cepstral Coefficients (MFCC) have been used frequently. In addition, a combination of STFT with MFCC is employed in [9] for training wide residual networks for speech enhancement. Compared to STFT, filterbased features like MFCC exhibit reduced dimensionality and are more suitable for learning algorithms, as they can reduce memory and computational requirements while maintaining comparable level of performance [7], [10], [11].…”
Section: Introductionmentioning
confidence: 99%