ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683201
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures Using Spatial Information

Abstract: We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information. We use a deep clustering approach which trains on multichannel mixtures and learns to project spectrogram bins to source clusters that correlate with various spatial features. We show that using such a training process we can obtain separation performance that is as good as making use of ground truth separation information. Once trained, this system is capable of performing s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
28
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 35 publications
(28 citation statements)
references
References 14 publications
0
28
0
Order By: Relevance
“…Multichannel-audio-based methods, on the other hand, can train a DNN to separate sound sources out of view or behind obstacles. Tzinis et al [14] trained a monaural separation network by using source signals estimated by applying K-means clustering on interchannel phase differences (IPDs) between two microphones. Almost simultaneously, Drude et al [12] proposed a similar approach that uses signals separated by the cACGMM [21].…”
Section: Unsupervised Training Of Neural Source Separationmentioning
confidence: 99%
See 1 more Smart Citation
“…Multichannel-audio-based methods, on the other hand, can train a DNN to separate sound sources out of view or behind obstacles. Tzinis et al [14] trained a monaural separation network by using source signals estimated by applying K-means clustering on interchannel phase differences (IPDs) between two microphones. Almost simultaneously, Drude et al [12] proposed a similar approach that uses signals separated by the cACGMM [21].…”
Section: Unsupervised Training Of Neural Source Separationmentioning
confidence: 99%
“…Unsupervised training for neural source separation using multichannel mixture signals has recently gained a lot of attention [12][13][14][15]. One approach is to generate supervised data by using multichannel separation methods [12][13][14]. This approach suffers from the estimation errors of the multichannel methods mentioned above.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, unsupervised DNN training techniques have been proposed [19,20]. These techniques estimate a time-frequency mask based on the DC.…”
Section: Introductionmentioning
confidence: 99%
“…Another option is to generate intermediate masks with an unsupervised teacher, as proposed in e.g. [10,11], and also in [12] where we demonstrate how to leverage a probabilistic spatial mixture model, namely a complex angular central Gaussian mixture model (cACGMM), to generate intermediate masks. However, this approaches require a -possibly hand-crafted -teacher system and also a lot of computational resources to either store the intermediate masks or generate them on-the-fly.…”
Section: Introductionmentioning
confidence: 99%