2021
DOI: 10.1109/jstsp.2020.3045846
|View full text |Cite
|
Sign up to set email alerts
|

A Multi-Scale Feature Recalibration Network for End-to-End Single Channel Speech Enhancement

Abstract: Deep neural networks based methods dominate recent development in single channel speech enhancement. In this paper, we propose a multi-scale feature recalibration convolutional encoder-decoder with bidirectional gated recurrent unit (BGRU) architecture for end-to-end speech enhancement. More specifically, multi-scale recalibration 2-D convolutional layers are used to extract local and contextual features from the signal. In addition, a gating mechanism is used in the recalibration network to control the inform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 31 publications
0
2
0
Order By: Relevance
“…In speech enhancement problem, the Short-time Fourier transform (STFT) is widely used [14], [15], [16] to achieve feature extraction. The main idea is supported by the recent research in speech processing that it is not clear what cues in which scales of window lengths contribute most to the final performance [8].…”
Section: Related Workmentioning
confidence: 99%
“…In speech enhancement problem, the Short-time Fourier transform (STFT) is widely used [14], [15], [16] to achieve feature extraction. The main idea is supported by the recent research in speech processing that it is not clear what cues in which scales of window lengths contribute most to the final performance [8].…”
Section: Related Workmentioning
confidence: 99%
“…Numerous approaches have been presented for extracting features at multiple scales that improve the quality of speech. In [15], the author proposed a multiscale feature network that uses different-sized convolution kernels to extract features at multiple scales. Due to the massive convolution kernel, the model has a large number of trainable parameters.…”
Section: Introductionmentioning
confidence: 99%
“…Considering the processing domain, traditional speech-enhancement methods can be divided into three categories, namely time [6][7][8][9][10], frequency [11][12][13][14] and time-frequency [15][16][17][18] domain methods. Despite the significant development of speech-enhancement techniques [19][20][21][22][23][24][25] from particular categories in recent years, there is still a problem of obtaining a high-quality speech signal with high noise attenuation. Typically, if a method maintains high noise attenuation properties, such a method does not always provide a significant improvement in speech quality.…”
Section: Introductionmentioning
confidence: 99%