2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021
DOI: 10.1109/waspaa52581.2021.9632676
|View full text |Cite
|
Sign up to set email alerts
|

Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement

Abstract: Recently, the standard variational autoencoder has been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. Variational autoencoders have then been conditioned on a label describing a high-level speech attribute (e.g. speech activity) that allows for a more explicit control of speech generation. However, the label is not guaranteed to be disentangled from the other latent variables, which results in limited performance improvements compared to… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…Additionally, this method can adopt various DNN structures [2], so the DNN-based SE algorithms [2] can be directly optimized by PVAE. This is not achieved by VAE-NMFbased algorithms [17]- [22]. The experimental results [26] indicate that the SE performance of the traditional DNN-based methods can be improved by introducing this PVAE-based DRL algorithm.…”
Section: Introductionmentioning
confidence: 92%
See 2 more Smart Citations
“…Additionally, this method can adopt various DNN structures [2], so the DNN-based SE algorithms [2] can be directly optimized by PVAE. This is not achieved by VAE-NMFbased algorithms [17]- [22]. The experimental results [26] indicate that the SE performance of the traditional DNN-based methods can be improved by introducing this PVAE-based DRL algorithm.…”
Section: Introductionmentioning
confidence: 92%
“…Recently, to improve traditional DNN's generalization ability, DRL-based SE algorithms are proposed [17]- [22]. The basic idea of these methods is that they use a variational autoencoder (VAE) [23] to learn speech representations when modeling speech, and apply a non-negative matrix factorization (NMF) to model noise.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Instead of learning a direct mapping from noisy to clean speech, generative models aim to learn the distribution of clean speech as a prior for speech enhancement. Several approaches have utilized deep generative models for speech enhancement using generative adversarial networks (GANs) [4], variational autoencoders (VAEs) [5,6,7,8], flow-based models [9], and more recently denoising diffusion probabilistic models (DDPMs) [10,11]. The main principle of these approaches is to learn the inherent properties of clean speech, We acknowledge the support by DASHH (Data Science in Hamburg -HELMHOLTZ Graduate School for the Structure of Matter) with the Grant-No.…”
Section: Introductionmentioning
confidence: 99%
“…This reparameterization trick also ensures that the parameter update in VAE is differentiable. At present, VAE has been widely used in the SE task [98,[105][106][107][108][109]. However, most of these algorithms only apply VAE to learn the clean speech representation, and they do not attempt to disentangle clean speech representation with other noise representations, which leads to inaccurate speech estimation.…”
Section: Variational Autoencoder (Vae)mentioning
confidence: 99%