2021
DOI: 10.1186/s13636-021-00204-9
|View full text |Cite
|
Sign up to set email alerts
|

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

Abstract: Deep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory syste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 43 publications
0
3
0
Order By: Relevance
“…An objective evaluation of the Wiener filter noise suppression performance was performed in this work by using the signal-to-noise ratio (SNR) and signal-to-noise ratio improvement (SNRI) as speech enhancement measures, and the mean square error (MSE) as signal fidelity measure [52][53][54]. Each is defined as follows.…”
Section: Wiener Filter Performance Metricsmentioning
confidence: 99%
“…An objective evaluation of the Wiener filter noise suppression performance was performed in this work by using the signal-to-noise ratio (SNR) and signal-to-noise ratio improvement (SNRI) as speech enhancement measures, and the mean square error (MSE) as signal fidelity measure [52][53][54]. Each is defined as follows.…”
Section: Wiener Filter Performance Metricsmentioning
confidence: 99%
“…This paper proposes a decoupling-style multiband fusion model, dubbed DMF-Net for full-band SE step by step. To be specific, we adopt a two-step strategy which consists of a pre-trained complex-domain-based SE network (LF-Net) for the low-frequency band (0-8 kHz) and two magnitude-based SE networks (MF-Net and HF-Net) for the middle-and high-frequency bands (8)(9)(10)(11)(12)(13)(14)(15)(16). In LF-Net, inspired by the preliminary study [9], we employ a decoupling-style strategy for wide-band speech enhancement by three sub-networks.…”
Section: Introductionmentioning
confidence: 99%
“…Note that the estimated low-frequency features are also fed into MF-Net and HF-Net to provide extra guidance. Finally, we fuse the low-, middle-and high-frequency regions to obtain the full-band speech, which is then fed into a low-complexity postprocessing module to further suppress the residual noise [12].…”
Section: Introductionmentioning
confidence: 99%