ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054171
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function

Abstract: In this paper, we propose a multi-channel speech source separation with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated speech signal by an unsupervised speech source separation with a statistical model. As a statistical model of microphone input signal, we adopts a timevarying spatial covariance matrix (SCM) model which includes reverberation and background noise submodels so as to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 29 publications
0
9
0
Order By: Relevance
“…The source signals estimated by a conventional BSS method can be used as pseudo-supervised data. Togami et al [23] proposed to train a network to predict a multichannel Wiener filter (MWF) estimated by FCA. Drude et al [22] proposed to train a network by directly maximizing a log-marginal likelihood of a BSS model called complex angular central Gaussian mixture model (cACGMM) [31].…”
Section: B Unsupervised Neural Source Separationmentioning
confidence: 99%
See 3 more Smart Citations
“…The source signals estimated by a conventional BSS method can be used as pseudo-supervised data. Togami et al [23] proposed to train a network to predict a multichannel Wiener filter (MWF) estimated by FCA. Drude et al [22] proposed to train a network by directly maximizing a log-marginal likelihood of a BSS model called complex angular central Gaussian mixture model (cACGMM) [31].…”
Section: B Unsupervised Neural Source Separationmentioning
confidence: 99%
“…The mixture signals are generated at 16 kHz. As in the previous studies [22], [23], we performed dereverberation [35] to the mixture signals. For stabilizing the dereverberation, we added white Gaussian noise with the signal-to-noise ratio of 30 dB to the mixture signals.…”
Section: A Datasetmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, several works propose to learn the network parameters by directly optimizing the output of the separation [9]. They extend the method to unsupervised learning [10] and resource constrained environments [11]. We note that these latter methods make use of powerful, yet computation-hungry, spatial filtering techniques, limiting the number of iterations of the algorithms through which backpropagation can be safely done.…”
Section: Introductionmentioning
confidence: 99%