Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2326
|View full text |Cite
|
Sign up to set email alerts
|

Music Source Activity Detection and Separation Using Deep Attractor Network

Abstract: In music signal processing, singing voice detection and music source separation are widely researched topics. Recent progress in deep neural network based source separation has advanced the state of the performance in the problem of vocal and instrument separation, while the problem of joint source activity detection and separation remains unexplored. In this paper, we propose an approach to perform source activity detection using the high-dimensional embedding generated by Deep Attractor Network (DANet) when … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(9 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…Recently, the work in [17] demonstrated that a common embedding space for musical instrument separation using various deep attractor networks could achieve competitive performance. Our system is similar to the anchored and/or expectation-maximization deep attractor networks in [17], but we use an auxiliary network to estimate the mean and covariance parameters for each instrument. We also explore what type of covariance model is most effective for musical source separation (tied vs. untied across classes, diagonal vs. spherical).…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the work in [17] demonstrated that a common embedding space for musical instrument separation using various deep attractor networks could achieve competitive performance. Our system is similar to the anchored and/or expectation-maximization deep attractor networks in [17], but we use an auxiliary network to estimate the mean and covariance parameters for each instrument. We also explore what type of covariance model is most effective for musical source separation (tied vs. untied across classes, diagonal vs. spherical).…”
Section: Introductionmentioning
confidence: 99%
“…The model we use for source separation is based on deep clustering (DC) [6,21]. We selected deep clustering because it is a highly successful approach that has inspired multiple successful variants [22][23][24][25][26][27]. Further, its separation framework is somewhat connected to our primitive spatial separation as it is based on clustering as well, but in a learned embedding space, and its objective function has been shown to be amenable to the introduction of weights.…”
Section: Training the Single-channel Modelmentioning
confidence: 99%
“…Specifically for separation tasks, speaker-discriminative embeddings are produced for targeted voice separation in [6] and for diarization in [17] yielding a significant improvement over the unconditional separation framework. Recent works [18,19] have utilized conditional embeddings for each music class in order to boost the performance of a deep attractor-network [20] for music separation.…”
Section: Introductionmentioning
confidence: 99%