On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Badawy, Dalia El; Duong, Ngoc Q. K.; Ozerov, Alexey

doi:10.1109/taslp.2016.2632528

Cited by 9 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of NMF components in W l j for each speech example was set to 32, while that for noise example was 16. These values were found to be reasonable in [15] and our work on single-channel case [18]. Each W l j were obtained by optimizing (17) with 20 MU iterations.…”

Section: A Dataset and Parameter Settingssupporting

confidence: 62%

“…. This leads to a straightforward extension of the conventional optimization criterion described by (15) where H j is now estimated by optimizing the criterion:…”

Section: B Proposed Source Variance Fitting With Gssm and Mixed Groumentioning

confidence: 99%

“…Following this trend, some recent works including ours have proposed to use a very abstract semantic information just about the types of audio sources existing in the mixture to guide the source separation. If one source in the mixture is known as "speech", then several speaker-independent speech examples can be used to create a universal speech model as presented in [14]; if several types of sound sources in the mixture are known (e.g., birdsong, piano, waterfall), their audio examples found by internet search can be used to learn the corresponding universal sound class models as presented in [15]. Such universal models were shown to be effective in guiding the source separation algorithm and resulted in promising performance.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we present an extension of the previous works [15], [16], [18] to the multichannel case where the NMF-based GSSM is combined with the full-rank spatial covariance model in a Gaussian modeling paradigm. Around this LGM, existing works have investigated several source spectral models such as Gaussian mixture model (GMM) [37], NMF as a linear model with nonnegativity constraints [36], [38], continuity model [39], kernel additive model [40], heavy-tailed distributionsbased model [41], [42], and recently DNN [24].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Gaussian Modeling-Based Multichannel Audio Source Separation Exploiting Generic Source Spectral Model

Duong

Nguyen³

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

As blind audio source separation has remained very challenging in real-world scenarios, some existing works, including ours, have investigated the use of a weakly-informed approach where generic source spectral models (GSSM) can be learned a priori based on nonnegative matrix factorization (NMF). Such approach was derived for single-channel audio mixtures and shown to be efficient in different settings. This paper proposes a multichannel source separation approach where the GSSM is combined with the source spatial covariance model within a unified Gaussian modeling framework. We present the generalized expectation-minimization (EM) algorithm for the parameter estimation. Especially, for guiding the estimation of the intermediate source variances in each EM iteration, we investigate the use of two criteria: (1) the estimated variances of each source are constrained by NMF, and (2) the total variances of all sources are constrained by NMF altogether. While the former can be seen as a source variance denoising step, the latter is viewed as an additional separation step applied to the source variance. We demonstrate the speech separation performance, together with its convergence and stability with respect to parameter setting, of the proposed approach using a benchmark dataset provided within the 2016 Signal Separation Evaluation Campaign. KEYWORDSMultichannel audio source separation, local Gaussian model, nonnegative matrix factorization, generic spectral model, group sparsity constraint.

show abstract

Section: A Dataset and Parameter Settingssupporting

confidence: 62%

“…. This leads to a straightforward extension of the conventional optimization criterion described by (15) where H j is now estimated by optimizing the criterion:…”

Section: B Proposed Source Variance Fitting With Gssm and Mixed Groumentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Gaussian Modeling-Based Multichannel Audio Source Separation Exploiting Generic Source Spectral Model

Duong

Nguyen³

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…We also explore what type of covariance model is most effective for musical source separation (tied vs. untied across classes, diagonal vs. spherical). Furthermore, we discuss a simple modification of our pre-trained embedding networks for query-by-example separation [18,19,20], where given an isolated example of a sound we want to separate, we can extract the portion of a mixture most like the query without supervision.…”

Section: Introductionmentioning

confidence: 99%

Class-conditional Embeddings for Music Source Separation

Seetharaman

Wichern

Venkataramani

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods. While most musical source separation techniques learn an independent model for each instrument, we propose using a common embedding space for the time-frequency bins of all instruments in a mixture inspired by deep clustering and deep attractor networks. Additionally, an auxiliary network is used to generate parameters of a Gaussian mixture model (GMM) where the posterior distribution over GMM components in the embedding space can be used to create a mask that separates individual sources from a mixture. In addition to outperforming a mask-inference baseline on the MUSDB-18 dataset, our embedding space is easily interpretable and can be used for query-based separation.

show abstract

An Introduction to Multichannel NMF for Audio Source Separation

Ozerov

Févotte

2018

Audio Source Separation

View full text Add to dashboard Cite

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Cited by 9 publications

References 31 publications

Gaussian Modeling-Based Multichannel Audio Source Separation Exploiting Generic Source Spectral Model

Gaussian Modeling-Based Multichannel Audio Source Separation Exploiting Generic Source Spectral Model

Class-conditional Embeddings for Music Source Separation

An Introduction to Multichannel NMF for Audio Source Separation

Contact Info

Product

Resources

About