Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1567
|View full text |Cite
|
Sign up to set email alerts
|

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition

Abstract: Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 30 publications
(12 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…In this way, the decoding layers to preserve source linguistic information in the reconstruction stage. Moreover, the U-net structure enables the gradients to flow deeper through the whole network [17], [31], and achieves more effective training.…”
Section: B 2-1-2d Attention-based Generatormentioning
confidence: 99%
“…In this way, the decoding layers to preserve source linguistic information in the reconstruction stage. Moreover, the U-net structure enables the gradients to flow deeper through the whole network [17], [31], and achieves more effective training.…”
Section: B 2-1-2d Attention-based Generatormentioning
confidence: 99%
“…There is also a 3D version of U-Net [35] designed for tackling three-dimensional problems. In the 1D domain, apart from the PPG to ABP signal translation discussed earlier, there have also been works in speech enhancement [36], echo cancellation [37], heartbeat detection, etc. Thus, the U-Net architecture has been modified in various ways for solving different types of problems and in a few cases, a shallow U-Net performed better than a deeper version.…”
Section: Introductionmentioning
confidence: 99%
“…These methods are often trained in a supervised setting and can be divided into time-domain and frequency-domain methods. The time-domain methods [1][2][3] use the neural network to map noisy speech waveform to clean speech waveform directly. The frequency-domain methods [4][5][6][7] typically use the noisy spectral feature (e.g., complex spectrum, magnitude spectrum) as the input of a neural model.…”
Section: Introductionmentioning
confidence: 99%