An interactive audio source separation framework based on non-negative matrix factorization

Duong, Ngoc Q. K.; Ozerov, Alexey; Chevallier, Louis; Sirot, Joel

doi:10.1109/icassp.2014.6853861

Cited by 20 publications

(18 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such information can be e.g., user-"hummed" sounds that mimic the sources in the mixture [13] or source activity annotation along time [14] or in a time-frequency plane [15]; the annotation information is then used, instead of training data, to guide the separation process. Furthermore, recent publications disclose an interactive strategy [16], [17] where the user can perform annotations on the spectrogram of intermediate separation results to gradually correct the remaining errors. Note however that most of the existing approaches need to use prior information which may not be easy to acquire in advance (e.g., musical score, text transcript), is difficult to produce (e.g., user-hummed examples), or simply requires very experienced users while being very time consuming (e.g., time-frequency annotations).…”

Section: Introductionmentioning

confidence: 99%

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Badawy

Duong

Ozerov

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-This article addresses the challenging problem of single-channel audio source separation. We introduce a novel user-guided framework where source models that govern the separation process are learned on-the-fly from audio examples retrieved online. The user only provides the search keywords that describe the sources in the mixture. In this framework, the generic spectral characteristics of each source are modeled by a universal sound class model learned from the retrieved examples via non-negative matrix factorization. We propose several group sparsity-inducing constraints in order to efficiently exploit a relevant subset of the universal model adapted to the mixture to be separated. We then derive the corresponding multiplicative update rules for parameter estimation. Separation results obtained from automated and user tests on mixtures containing various types of sounds confirm the effectiveness of the proposed framework.Index Terms-On-the-fly audio source separation, user-guided, non-negative matrix factorization, group sparsity, universal sound class model.

show abstract

Section: Introductionmentioning

confidence: 99%

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Badawy

Duong

Ozerov

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In summary, this strategy consists in first updating locally all the entries of one matrix using the corresponding update among (12), (13) and (15), and in choosing one entry per column yielding the highest likelihood while setting to zero all the other entries (see [20] for more details). This strategy guarantees a local optimization of the cost (11) in the sense that the cost is guaranteed to remain non-increasing after each update.…”

Section: Updates With Structural Constraintsmentioning

confidence: 99%

“…It consists in using some auxiliary information about the sources and/or the mixing process to guide the separation. For example, score-informed approaches rely on musical score to guide the separation in music recordings [3][4][5][6], separation-by-humming (SbH) algorithms exploit a sound "hummed" by the user mimicking the source of interest [7,8], and user-guided approaches take into account knowledge about, e.g., user-selected F0 track [9] or userannotated source activity patterns along the spectrogram of the mixture [10,11] and/or that of the estimated sources [12,13]. In line with this direction, there are also speech separation systems informed, e.g., by speaker gender [14], by corresponding video [15], or by the natural language structure [16].…”

Section: Introductionmentioning

confidence: 99%

Text-Informed Audio Source Separation. Example-Based Approach Using Non-Negative Matrix Partial Co-Factorization

Magoarou

Ozerov

Duong

2014

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

The so-called informed audio source separation, where the separation process is guided by some auxiliary information, has recently attracted a lot of research interest since classical blind or non-informed approaches often do not lead to satisfactory performances in many practical applications. In this paper we present a novel text-informed framework in which a target speech source can be separated from the background in the mixture using the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the nonnegative matrix partial co-factorization (NMPCF) model based on a so-called excitation-filter-channel speech model. Such a modeling allows sharing the linguistic information between the speech example and the speech in the mixture. The corresponding multiplicative update (MU) rules are eventually derived for the parameters estimation and several extensions of the model are proposed and investigated. We Most of this work was done while the first author was with Technicolor, and a part of the work has been presented at the 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) [1]. L. Le Magoarou ( ) Inria Rennes -Bretagne Atlantique,

show abstract

“…It is well adapted to scenarios where the original sources are not available but high separation quality is nevertheless required. The additional information can be of different types: spatial and spectral information about the sources [5], [6], language structure [7], visual information [8], information about the recording/mixing conditions [9], musical scores [10]- [13], or user input [14]- [21]. For instance, the user can provide relevant information by drawing the fundamental frequency curve [18], by uttering the same sentence [16], by humming the melody [14], or even by selecting specific areas in the spectrogram of the mixture [17].…”

Section: Introductionmentioning

confidence: 99%

Multi-Channel Audio Source Separation Using Multiple Deformed References

Souviraà-Labastie

Olivero

Vincent

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

We present a general multi-channel source separation framework where additional audio references are available for one (or more) source(s) of a given mixture. Each audio reference is another mixture which is supposed to contain at least one source similar to one of the target sources. Deformations between the sources of interest and their references are modeled in a linear manner using a generic formulation. This is done by adding transformation matrices to an excitation-filter model, hence affecting different axes, namely frequency, dictionary component or time. A nonnegative matrix co-factorization algorithm and a generalized expectation-maximization algorithm are used to estimate the parameters of the model. Different model parameterizations and different combinations of algorithms are tested on music plus voice mixtures guided by music and/or voice references and on professionally-produced music recordings guided by cover references. Our algorithms improve the signal-to-distortion ratio (SDR) of the sources with the lowest intensity by 9 to 15 decibels (dB) with respect to original mixtures. Index Terms-Generalized Expectation-Maximization (GEM) algorithm, source separation. 2329-9290

show abstract

An interactive audio source separation framework based on non-negative matrix factorization

Cited by 20 publications

References 13 publications

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

Text-Informed Audio Source Separation. Example-Based Approach Using Non-Negative Matrix Partial Co-Factorization

Multi-Channel Audio Source Separation Using Multiple Deformed References

Contact Info

Product

Resources

About