2021
DOI: 10.48550/arxiv.2106.12743
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Simultaneous Denoising and Dereverberation Framework with Target Decoupling

Abstract: Background noise and room reverberation are regarded as two major factors to degrade the subjective speech quality. In this paper, we propose an integrated framework to address simultaneous denoising and dereverberation under complicated scenario environments. It adopts a chain optimization strategy and designs four sub-stages accordingly. In the first two stages, we decouple the multi-task learning w.r.t. complex spectrum into magnitude and phase, and only implement noise and reverberation removal in the magn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 41 publications
0
10
0
Order By: Relevance
“…Four evaluation metrics are used, namely perceptual evaluation of speech quality (PESQ), extended short-time objective intelligibility (eSTOI), DNSMOS [32] [18], namely denoising, dereverberation, spectral refinement and post-processing, respectively. Uformer gives the best performance in both objective and subjective evaluation, while the result of the causal version doesn't degrade generally.…”
Section: Results and Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Four evaluation metrics are used, namely perceptual evaluation of speech quality (PESQ), extended short-time objective intelligibility (eSTOI), DNSMOS [32] [18], namely denoising, dereverberation, spectral refinement and post-processing, respectively. Uformer gives the best performance in both objective and subjective evaluation, while the result of the causal version doesn't degrade generally.…”
Section: Results and Analysismentioning
confidence: 99%
“…We conduct ablation experiments to prove the effectiveness of each proposed sub-modules, including a) Uformer without FA, b) Uformer without DC, c) Uformer without encoder decoder attention, d) substitute dilated complex & real dual-path conformer with complex & real LSTM, e) Uformer without all real-valued sub-modules (means we only model complex spectrum) and f) Uformer without all complex-valued sub-modules (means we only model magnitude). We also compare Uformer with DCCRN, PHASEN [28], GCRN, SDD-Net [18], TasNet and DPRNN [29]. DCCRN, PHASEN and GCRN are in complex domain, which aim to model magnitude and phase simultaneously.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The 512-point STFT is utilized, leading to a 257-D spectral feature. Due to the efficacy of power compression in dereverberation and denoising tasks [21], we conduct the power compression toward the spectral magnitude while leaving the phase unaltered, and the optimal compression coefficient is set to 0.5. For the non-parallel training strategy, we randomly crop a fixed-length segment (i.e., 108 frames) from a randomly selected noisy audio file as the input, while the target is a randomly selected clean audio file whose content is different with the input audio.…”
Section: Implementation Setupmentioning
confidence: 99%
“…Multi-stage algorithms based on curriculum learning begin to thrive in the SE area [20,21], where the original difficult task is decomposed into multiple simpler sub-tasks and a better result can be induced progressively. Motivated by these studies, we couple a magnitude spectrum estimation CycleGAN (MCGAN) and a complex spectrum refined CycleGAN (CCGAN) as a Cycle-in-Cycle GAN (CinCGAN) paradigm to estimate clean spectral magnitude and phase information step-by-step under non-parallel training.…”
Section: Introductionmentioning
confidence: 99%