Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Schulze-Forster, Kilian; Richard, Gaël; Kelley, L. A.; Doire, Clement S. J.; Badeau, Roland

doi:10.1109/taslp.2023.3252272

Cited by 9 publications

(16 citation statements)

References 63 publications

(136 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The original model proposed in [1], referred to as UMSS, was shown to be efficient for complex source separation problems in which individual sources are not available for training, the sources are homogeneous (only singing voices), or only a limited amount of mixture recording data is obtainable. The approach is inspired by the recent hybrid deep learning paradigm, which integrates signal processing models in DNNs to incorporate domain knowledge [9,10].…”

Section: Unsupervised Music Source Separationmentioning

confidence: 99%

“…The latter strategy obtains the best overall results and is then selected in this work. The model used in [1] to obtain source fundamental frequencies was given in [11] and performs multi-F0 extraction by first processing a spectral representation through a DNN, and then converting the output multi-frequency salience map to F0 contours. In [1], a voice assignment heuristic based on temporal pitch continuity was ©2024 IEEE.…”

Section: Unsupervised Music Source Separationmentioning

confidence: 99%

“…2. It is based on the unsupervised model described above [1] but with the integration of the multi-F0 estimation and vocal assignment as three differentiable blocks (displayed in purple on Fig. 2).…”

Section: The Complete Modelmentioning

confidence: 99%

“…The resulting architecture is then fully differentiable and can be trained end-to-end. More precisely, the proposed architecture takes as input a 4second audio mixture which is processed in parallel in two branches: 1) the first branch is based on the encoder of [1], which extracts the main characteristics of the observed audio. 2) The second branch is dedicated to the estimation of multiple fundamental frequencies.…”

Section: The Complete Modelmentioning

confidence: 99%

“…This paper builds upon the work of Schulze-Forster et al [1], which proposes an unsupervised DNN model that has shown state-of-the-art performance in separating choral singing. We expand their work by integrating the multi-F0 estimation and voice assignment modules as differentiable blocks within the model and by proposing a novel method for differentiable F0 contour extraction.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Fully Differentiable Model for Unsupervised Singing Voice Separation

Richard,

Chouteau,

Torres

2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

A novel model was recently proposed by for unsupervised music source separation. This model allows to tackle some of the major shortcomings of existing source separation frameworks. Specifically, it eliminates the need for isolated sources during training, performs efficiently with limited data, and can handle homogeneous sources (such as singing voice). But, this model relies on an external multipitch estimator and incorporates an Ad hoc voice assignment procedure. In this paper, we propose to extend this framework and to build a fully differentiable model by integrating a multipitch estimator and a novel differentiable assignment module within the core model. We show the merits of our approach through a set of experiments, and we highlight in particular its potential for processing diverse and unseen data.

show abstract

Section: Unsupervised Music Source Separationmentioning

confidence: 99%