2019
DOI: 10.1162/neco_a_01217
|View full text |Cite
|
Sign up to set email alerts
|

Supervised Determined Source Separation with Multichannel Variational Autoencoder

Abstract: This paper proposes a multichannel source separation technique called the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class labels, we can use the trained decoder distribution as a universal generative model capable of generating spectrograms conditioned on a specified class label. By treating the latent space variable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
71
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 91 publications
(71 citation statements)
references
References 36 publications
0
71
0
Order By: Relevance
“…VAEs have recently been studied for modeling and generation of speech signals [28,29,30], and synthesizing music sounds in [31]. They have also been used for speech enhancement [32,33] and feature learning for ASR [34,35]. Recent studies in ASV have studied the use of VAEs in data augmentation [36], regularisation [37] and domain adaptation [38] for deep speaker embeddings (x-vectors).…”
Section: Introductionmentioning
confidence: 99%
“…VAEs have recently been studied for modeling and generation of speech signals [28,29,30], and synthesizing music sounds in [31]. They have also been used for speech enhancement [32,33] and feature learning for ASR [34,35]. Recent studies in ASV have studied the use of VAEs in data augmentation [36], regularisation [37] and domain adaptation [38] for deep speaker embeddings (x-vectors).…”
Section: Introductionmentioning
confidence: 99%
“…In the experiments, we will compare this variational E-step with an alternative proposed in [29], which consists in relying only on a point estimate of the latent variables. In our framework, this approach can be understood as assuming that the approximate posterior q(z x; θenc) is a dirac delta function centered at the maximum a posteriori estimate z ⋆ .…”
Section: Point-estimate E-stepmentioning
confidence: 99%
“…both single-channel [8][9][10] and multichannel scenarios [11][12][13][14][15]. The main idea of these studies is to use variational autoencoders (VAEs) [16] to replace the NMF generative model, thus benefiting from DNNs' representational power.…”
Section: Introductionmentioning
confidence: 99%
“…Once the VAE is trained, the encoder could in principle be used to approximate the true posterior. Interestingly, [15] proposes a heuristic algorithm for multichannel source separation based on [13] which uses this property of the encoder to achieve computational efficiency. However this inference algorithm is not statistically principled.…”
Section: Introductionmentioning
confidence: 99%