Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-517
|View full text |Cite
|
Sign up to set email alerts
|

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 39 publications
(20 citation statements)
references
References 0 publications
1
19
0
Order By: Relevance
“…The outputs of the two decoders are weighted and summed. erate on the raw waveform of speech signals and the time-frequency (TF) domain approaches [10][11][12][13][14][15][16][17][18][19][20][21] that manipulate the speech spectrogram are proposed. Although the time-domain approaches have made some success, the TF domain approach has dominated the research trend.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The outputs of the two decoders are weighted and summed. erate on the raw waveform of speech signals and the time-frequency (TF) domain approaches [10][11][12][13][14][15][16][17][18][19][20][21] that manipulate the speech spectrogram are proposed. Although the time-domain approaches have made some success, the TF domain approach has dominated the research trend.…”
Section: Introductionmentioning
confidence: 99%
“…Typically, most of recent studies treat the real and imaginary parts as two separated real-valued sequences and model them using realvalued networks [10][11][12][13][14][15][16][17]. However, the speech spectrogram and the complex targets are naturally complex-valued, much richer representations and more efficiently modelling could be potentially achieved by complex networks [18,19].…”
Section: Introductionmentioning
confidence: 99%
“…This limitation is usually reflected as artifacts in the The authors are with the Institute of Signal Processing and System Theory, University of Stuttgart, Germany (e-mail: sherif.abdulatif@ iss.uni-stuttgart.de; ruizhe.cao96@gmail.com; bin.yang@iss.uni-stuttgart.de). A shorter version is available in https://arxiv.org/abs/2203.15149 [1]. reconstructed speech.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, supervised methods based on deep learning have been widely and successfully used to solve the noise reduction in non‐stationary noise environments, with the mainstream methods falling into two categories: time‐frequency domain (T‐F domain) methods and time domain methods. T‐F domain methods [7–9]: These methods usually perform short‐time fourier transform (STFT) on the noise to obtain the amplitude and phase and then achieve the enhanced amplitude by estimating the weighted mask. Finally, the original phase and enhanced amplitude of the speech signal are reconstructed by an inverse short‐time Fourier transform (iSTFT).…”
Section: Introductionmentioning
confidence: 99%
“…(1) T-F domain methods [7][8][9]: These methods usually perform short-time fourier transform (STFT) on the noise to obtain the amplitude and phase and then achieve the enhanced amplitude by estimating the weighted mask. Finally, the original phase and enhanced amplitude of the speech signal are reconstructed by an inverse short-time Fourier transform (iSTFT).…”
Section: Introductionmentioning
confidence: 99%