Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1137
|View full text |Cite
|
Sign up to set email alerts
|

A Simultaneous Denoising and Dereverberation Framework with Target Decoupling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 53 publications
(21 citation statements)
references
References 0 publications
0
21
0
Order By: Relevance
“…The 320-point STFT is utilized and 161dimension spectral features can be obtained. Due to the efficacy of the compressed spectrum in dereverberation and denoising task [15], [36], we conduct the power compression toward the magnitude while remaining the phase unaltered, and the optimal compression coefficient is set to 0.5, i.e., Cat |X| 0.5 cos (θ X ) , |X| 0.5 sin (θ X ) as input, Cat |S| 0.5 cos (θ S ) , |S| 0.5 sin (θ S ) as target. All the models are optimized using Adam [37] with the learning rate of 8e-4.…”
Section: B Implementation Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…The 320-point STFT is utilized and 161dimension spectral features can be obtained. Due to the efficacy of the compressed spectrum in dereverberation and denoising task [15], [36], we conduct the power compression toward the magnitude while remaining the phase unaltered, and the optimal compression coefficient is set to 0.5, i.e., Cat |X| 0.5 cos (θ X ) , |X| 0.5 sin (θ X ) as input, Cat |S| 0.5 cos (θ S ) , |S| 0.5 sin (θ S ) as target. All the models are optimized using Adam [37] with the learning rate of 8e-4.…”
Section: B Implementation Setupmentioning
confidence: 99%
“…For example, in [14], real-valued convolutional recurrent networks (CRN) were leveraged to directly map the RI components of target speech, where the enhanced RI components were decoded by two decoders respectively. More recently, a handful of multi-stage decoupling-style methods have thrived in the SE area and were demonstrated to achieve a remarkable performance [10], [15], [16]. Instead of packing the mapping process into only one black box in the previous single-stage paradigm, these multi-stage methods decoupled the original complex spectrum estimation into optimizing magnitude and phase stage by stage, and alleviated the implicit compensation effect between two targets [17].…”
Section: Introductionmentioning
confidence: 99%
“…DNS-MOS is used to do model training and model selection during noise suppression development. DNSMOS is also used for doing ablation studies for noise suppressors [22,23]. DNS-MOS has been quite popular, with over a hundred researchers using it after several months of releasing it.…”
Section: Related Workmentioning
confidence: 99%
“…However, most of the previous studies on speech enhancement are for narrow-band (8 kHz) or wide-band (16 kHz) audio, and there are few methods for 48 kHz full-band audio. Deep learning-based speech enhancement methods [1,2,3] have achieved impressive performance on wide-band audio, but the lack of sufficient training data has become a major limitation for full-band deep learning speech enhancement methods. The recent 4th Microsoft * Equal contribution Deep Noise Suppression (DNS-4) Challenge 1 extends efforts to full-band single-channel speech enhancement tasks with a massive training dataset and real-scenario test set.…”
Section: Introductionmentioning
confidence: 99%