ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414060
|View full text |Cite
|
Sign up to set email alerts
|

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Abstract: Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to noise presence, especially in low signal-to-noise ratios (SNRs). To increase the robustness of the VAE, we propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs. We evaluate our approach on real recording… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(17 citation statements)
references
References 19 publications
0
12
0
1
Order By: Relevance
“…The proposed model could also be applied to pitch-informed speech enhancement. Indeed, several recent weakly-supervised speech enhancement methods consist in estimating the VAE latent representation of a clean speech signal given a noisy speech signal (Bando et al, 2018;Leglaive et al, 2018;Sekiguchi et al, 2018;Leglaive et al, 2019b,a;Pariente et al, 2019;Leglaive et al, 2020;Richter et al, 2020;Carbajal et al, 2021;Fang et al, 2021). Using the proposed conditional deep generative speech model, this estimation could be constrained given the f 0 contour computed with a robust f 0 estimation algorithm such as CREPE .…”
Section: Discussionmentioning
confidence: 99%
“…The proposed model could also be applied to pitch-informed speech enhancement. Indeed, several recent weakly-supervised speech enhancement methods consist in estimating the VAE latent representation of a clean speech signal given a noisy speech signal (Bando et al, 2018;Leglaive et al, 2018;Sekiguchi et al, 2018;Leglaive et al, 2019b,a;Pariente et al, 2019;Leglaive et al, 2020;Richter et al, 2020;Carbajal et al, 2021;Fang et al, 2021). Using the proposed conditional deep generative speech model, this estimation could be constrained given the f 0 contour computed with a robust f 0 estimation algorithm such as CREPE .…”
Section: Discussionmentioning
confidence: 99%
“…Thus, they are trained solely to generate clean speech and are therefore considered more robust to different acoustic environments compared to their discriminative counterparts. In fact, generative approaches have shown to perform better under mismatched training and test conditions [8,11,12,13]. However, they are currently less studied and still lag behind discriminative approaches, which is a strong incentive to conduct more research to realize their full potential.…”
Section: Forward Processmentioning
confidence: 99%
“…Instead of learning a direct mapping from noisy to clean speech, generative models aim to learn the distribution of clean speech as a prior for speech enhancement. Several approaches have utilized deep generative models for speech enhancement using generative adversarial networks (GANs) [4], variational autoencoders (VAEs) [5,6,7,8], flow-based models [9], and more recently denoising diffusion probabilistic models (DDPMs) [10,11]. The main principle of these approaches is to learn the inherent properties of clean speech, We acknowledge the support by DASHH (Data Science in Hamburg -HELMHOLTZ Graduate School for the Structure of Matter) with the Grant-No.…”
Section: Introductionmentioning
confidence: 99%
“…Dennoch können neuronale Netze auf vielfältige Weise in der Medizin eingesetzt werden. Einsatzgebiete sind die Erkennung von Auffälligkeiten im Rahmen der bildgebenden Diagnostik [34] oder die Filterung von Stör-und Hintergrundgeräuschen in Hörgeräten [12]. Aktuelle Projekte beschäftigen sich mit der Erkennung von Gefäßen in Schnittbildgebungen ohne Kontrastmittel, was zu einer Vermeidung von kontrastmittelassoziierten Komplikationen im Rahmen dieser Standardbildgebung führen könnte.…”
Section: Neuronaleunclassified