2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7177935
|View full text |Cite
|
Sign up to set email alerts
|

Scalable audio separation with light Kernel Additive Modelling

Abstract: Recently, Kernel Additive Modelling (KAM) was proposed as a unified framework to achieve multichannel audio source separation. Its main feature is to use kernel models for locally describing the spectrograms of the sources. Such kernels can capture source features such as repetitivity, stability over time and/or frequency, self-similarity, etc. KAM notably subsumes many popular and effective methods from the state of the art, including REPET and harmonic/percussive separation with median filters. However, it a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 46 publications
(42 citation statements)
references
References 27 publications
0
40
0
Order By: Relevance
“…In addition, training on real data now leads to a performance decrease on simulated data, while Sivasankaran et al (2015) found it to consistently improve performance on both real and simulated data. Along with the recent results of Nugraha et al (2016b) on another dataset, this suggests that, although weighted EM made little difference for spectral models other than DNN (Liutkus et al, 2015), weighted EM outperforms exact EM for the estimation of multichannel statistics from DNN outputs. More work on the estimation of multichannel statistics from DNN outputs is therefore also required.…”
Section: Impact Of Ground Truth Estimationmentioning
confidence: 95%
See 1 more Smart Citation
“…In addition, training on real data now leads to a performance decrease on simulated data, while Sivasankaran et al (2015) found it to consistently improve performance on both real and simulated data. Along with the recent results of Nugraha et al (2016b) on another dataset, this suggests that, although weighted EM made little difference for spectral models other than DNN (Liutkus et al, 2015), weighted EM outperforms exact EM for the estimation of multichannel statistics from DNN outputs. More work on the estimation of multichannel statistics from DNN outputs is therefore also required.…”
Section: Impact Of Ground Truth Estimationmentioning
confidence: 95%
“…We performed this experiment using the DNN-based multichannel source separation technique of Nugraha et al (2016a), which is a variant of the one of Sivasankaran et al (2015) that relies on exact EM updates for the spatial covariance matrices (Duong et al, 2010) instead of the weighted EM updates of Liutkus et al (2015).…”
Section: Impact Of Ground Truth Estimationmentioning
confidence: 99%
“…Audio source separation has attracted considerable attention in the last decade. Various approaches have been introduced so far such as local Gaussian modeling [1,2], non-negative factorization [3][4][5], kernel additive modeling [6] and combinations of those approaches [7][8][9]. Recently, deep neural networks (DNNs) based source separation methods has shown significant improvement in separation performance over earlier methods.…”
Section: Introductionmentioning
confidence: 99%
“…We use the training partition of the DSD100 [9] database as our supervised training set Dm. We split the multi-track databases iKala [7], MedleyDB [19] and CCMixter [20] into thirds, and use one third of tracks from each database to form the unlabelled dataset Du and the source datasets D k s needed for semi-supervised training. Our validation and test set is each built by taking another third of tracks from iKala, MedleyDB, and CCMixter, in addition to 25 tracks from the test partition of DSD100.…”
Section: Datasetsmentioning
confidence: 99%