Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience

Cuesta, Helena

doi:10.5334/tismir.121

Cited by 5 publications

(18 citation statements)

References 18 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We assume that the fundamental frequencies for each of the J sources can be obtained from the mixture signal with a multiple F0 estimation system. Given that many such systems exist [45], [18], [46] and that it is still an active research area, we are confident that this is a reasonable assumption. When all F0s are obtained, each F0 value needs to be assigned to one specific source.…”

Section: B Parameter Estimationmentioning

confidence: 90%

“…Section IV-B. F0 estimates are usually provided at a frame rate which is smaller than the sample rate [45], [18], [46]. Therefore, following [17], the source specific F0 time series are upsampled to the sample rate using bilinear interpolation.…”

Section: B Parameter Estimationmentioning

confidence: 99%

“…The F0s are obtained from the mixture signals using the multiple F0 estimation model of Cuesta et al [18]. We use the pre-trained "Model 3" which is available online 6 .…”

Section: B Experimental Setupmentioning

confidence: 99%

“…In [11], H F 0 is initialized using F0 information of the predominant source estimated using the signal model in (16). We initialize H F 0 using the F0 information we obtained from the multi-pitch estimation [18]. In [11], the spectral templates of the residual sources W O ∈ R F ×R and their activations H O ∈ R R×N are initialized randomly.…”

Section: Baselinesmentioning

confidence: 99%

“…Separation is achieved because the F0s for all sources are estimated from the mixture and assigned to the sources beforehand. This can be done using existing methods such as [18], [19].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Schulze-Forster

Richard

Kelley

et al. 2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely costly to obtain for musical mixtures. This raises a need for unsupervised methods. We propose a novel unsupervised modelbased deep learning approach to musical source separation. Each source is modelled with a differentiable parametric sourcefilter model. A neural network is trained to reconstruct the observed mixture as a sum of the sources by estimating the source models' parameters given their fundamental frequencies. At test time, soft masks are obtained from the synthesized source signals. The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods based on nonnegative matrix factorization and a supervised deep learning baseline. Integrating domain knowledge in the form of source models into a data-driven method leads to high data efficiency: the proposed approach achieves good separation quality even when trained on less than three minutes of audio. This work makes powerful deep learning based separation usable in scenarios where training data with ground truth is expensive or nonexistent.

show abstract

Section: B Parameter Estimationmentioning

confidence: 90%

Section: B Parameter Estimationmentioning

confidence: 99%

“…The F0s are obtained from the mixture signals using the multiple F0 estimation model of Cuesta et al [18]. We use the pre-trained "Model 3" which is available online 6 .…”

Section: B Experimental Setupmentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%

“…Separation is achieved because the F0s for all sources are estimated from the mixture and assigned to the sources beforehand. This can be done using existing methods such as [18], [19].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Schulze-Forster

Richard

Kelley

et al. 2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

A Fully Differentiable Model for Unsupervised Singing Voice Separation

Richard,

Chouteau,

Torres

2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

A novel model was recently proposed by for unsupervised music source separation. This model allows to tackle some of the major shortcomings of existing source separation frameworks. Specifically, it eliminates the need for isolated sources during training, performs efficiently with limited data, and can handle homogeneous sources (such as singing voice). But, this model relies on an external multipitch estimator and incorporates an Ad hoc voice assignment procedure. In this paper, we propose to extend this framework and to build a fully differentiable model by integrating a multipitch estimator and a novel differentiable assignment module within the core model. We show the merits of our approach through a set of experiments, and we highlight in particular its potential for processing diverse and unseen data.

show abstract

Artificial Intelligence and Musicking

Berkowitz

2024

Music Perception: An Interdisciplinary Journal

View full text Add to dashboard Cite

Artificial intelligence (AI) deployed for customer relationship management (CRM), digital rights management (DRM), content recommendation, and content generation challenge longstanding truths about listening to and making music. CRM uses music to surveil audiences, removes decision-making responsibilities from consumers, and alters relationships among listeners, artists, and music. DRM overprotects copyrighted content by subverting Fair Use Doctrine and privatizing the Public Domain thereby restricting human creativity. Generative AI, often trained on music misappropriated by developers, renders novel music that seemingly represents neither the artistry present in the training data nor the handiwork of the AI’s user. AI music, as such, appears to be produced through AI cognition, resulting in what some have called “machine folk” and contributing to a “culture in code.” A philosophical analysis of these relationships is required to fully understand how AI impacts music, artists, and audiences. Using metasynthesis and grounded theory, this study considers physical reductionism, metaphysical nihilism, existentialism, and modernity to describe the quiddity of AI’s role in the music ecosystem. Concluding thoughts call researchers and educators to act on philosophical and ethical discussions of AI and promote continued research, public education, and democratic/laymen intervention to ensure ethical outcomes in the AI music space.

show abstract

Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience

Cited by 5 publications

References 18 publications

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

A Fully Differentiable Model for Unsupervised Singing Voice Separation

Artificial Intelligence and Musicking

Contact Info

Product

Resources

About