2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP) 2019
DOI: 10.1109/mlsp.2019.8918699
|View full text |Cite
|
Sign up to set email alerts
|

Deep Bayesian Unsupervised Source Separation Based On A Complex Gaussian Mixture Model

Abstract: This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although multichannel methods based on spatial information can work without such training data, they are often sensitive to parameter initialization and degraded with the sources located close to each other. The proposed method uses a cost function based on a spatial model called a co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 26 publications
(70 reference statements)
0
5
0
Order By: Relevance
“…For monocular depth estimation, self-supervised learning is an active research topic because oracle depth maps are not easy to obtain [19]. Self-supervised learning of neural sound source separation has also been investigated by using a spatial model of multichannel audio signals [20], [21]. These methods do not require clean source signals, which are often unavailable in the real recordings.…”
Section: B Self-supervised Learning With Audio and Visual Datamentioning
confidence: 99%
See 1 more Smart Citation
“…For monocular depth estimation, self-supervised learning is an active research topic because oracle depth maps are not easy to obtain [19]. Self-supervised learning of neural sound source separation has also been investigated by using a spatial model of multichannel audio signals [20], [21]. These methods do not require clean source signals, which are often unavailable in the real recordings.…”
Section: B Self-supervised Learning With Audio and Visual Datamentioning
confidence: 99%
“…In order to achieve localization of objects with similar appearance, we present a self-supervised training method for multichannel neural AV-SSL that uses pairs of 360 • images and the corresponding multichannel audio mixtures. The selfsupervised training is formulated in a probabilistic manner based on a spatial audio model called a cGMM as in [21].…”
Section: Self-supervised Training Of Audio-visual Sound Source Locali...mentioning
confidence: 99%
“…TF masks for the integration into the mixture model are obtained with a bidirectional long short-term memory (LSTM) network in [18]. Both [16] and [21] take advantage of spatial clustering methods to train neural networks in an unsupervised way, as well as to compute the final masks in the end. In [17], a convolutional neural network (CNN) with utterance-level PIT is employed prior to the mixture model-based mask estimation.…”
Section: Introductionmentioning
confidence: 99%
“…Teacher-student approach has also been studied to address the mismatch between the real and simulated environments recently [10][11][12][13]. When multiple microphones are available, a standard approach is to use an unsupervised spatial clustering approach as a teacher for a single-channel separation network student [10][11][12]. However, this approach relies on multimodal training data and has not been evaluated with recent developed end-to-end separation models.…”
Section: Introductionmentioning
confidence: 99%