2020
DOI: 10.1049/iet-spr.2020.0134
|View full text |Cite
|
Sign up to set email alerts
|

Single‐channel dereverberation and denoising based on lower band trained SA‐LSTMs

Abstract: The supervised single‐channel speech enhancement presents one mixture recording at the input of the neural network and updates network parameters in order to generate an output as the reconstructed speech signal. However, current neural networks‐based single‐channel speech enhancement methods are not able to fully utilise pertinence with the specific frequency range of speech signals with limited computational complexity. In this study, the authors studied the power spectral density of mixtures with human spee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 36 publications
(48 reference statements)
0
9
0
Order By: Relevance
“…Finally, the proposed method denoises the speech mixture in a highly reverberant environment. Future work should be dedicated to exploit the dereverberation pretask [41], [42] to further refine the speech enhancement performance.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, the proposed method denoises the speech mixture in a highly reverberant environment. Future work should be dedicated to exploit the dereverberation pretask [41], [42] to further refine the speech enhancement performance.…”
Section: Discussionmentioning
confidence: 99%
“…Although most of the reverberations are removed by DM, the remaining reverberations in Ŷd still limit the performance [7]. Thus, in the second sub-layer, we exploit ERM in the second sub-layer to further improve the speech enhancement in reverberant environments, which can be defined as:…”
Section: Masking Modulementioning
confidence: 99%
“…Followed by our previous work [7], to further improve the speech enhancement performance, we introduce both the dereverberation mask (DM) and the estimated ratio mask (ERM) to provide the time-frequency relationships between the clean speech signal and the reverberant mixture. Hence, inspired by [8], we propose a multi pre-tasks SSL method which only needs a limited set of randomly selected clean speech signals and the corresponding mixture recordings in the pre-training.…”
Section: Introductionmentioning
confidence: 99%
“…By using short-time Fourier transform (STFT), the state-of-the-art methods estimate the spectrogram of the desired speech signal from the mixture spectrogram (Kumawat and Raman 2020) (Pandey and Wang 2020). However, it has been confirmed that the background noise is uniformly distributed at the full band and human speech occupies in the lower frequency-band (Li, Sun, and Naqvi 2021). Thus, the whole T-F attention map is further divided into three sub attention maps, time attention (TA), high frequency-band attention (HFA), and low frequency-band attention (LFA).…”
Section: Introductionmentioning
confidence: 99%