2021
DOI: 10.48550/arxiv.2111.06015
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation

Abstract: Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual info… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 33 publications
(35 reference statements)
0
1
0
Order By: Relevance
“…Since the clean data of the Realnoisy set is not available, we will not evaluate the topline for the real-world speaker. As for the baseline model, we directly conduct speech enhancement through the state-of-the-art pre-trained Uformer [38] model on all noisy target speakers to obtain denoised speech. For the proposed NC-WaveGAN, we only use the clean and the noisy version of VCTK-clean, where the target noisy speakers in VCTK-noisy and Real-noise are excluded since the WaveGAN model is robust for unseen speakers.…”
Section: And Experimental Setupmentioning
confidence: 99%
“…Since the clean data of the Realnoisy set is not available, we will not evaluate the topline for the real-world speaker. As for the baseline model, we directly conduct speech enhancement through the state-of-the-art pre-trained Uformer [38] model on all noisy target speakers to obtain denoised speech. For the proposed NC-WaveGAN, we only use the clean and the noisy version of VCTK-clean, where the target noisy speakers in VCTK-noisy and Real-noise are excluded since the WaveGAN model is robust for unseen speakers.…”
Section: And Experimental Setupmentioning
confidence: 99%