Non-negative dynamical system with application to speech and audio

Févotte, Cédric; Roux, Jonathan Le; Hershey, John R.

doi:10.1109/icassp.2013.6638240

Cited by 43 publications

(51 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At testing, the proposed method, like standard NMF approaches, treats different time-frames independently, ignoring the temporal dynamics of speech signals. Recent studies have proposed regularized variants of NMF or PLCA trying to overcome this limitation, including co-occurrence statistics of the basis functions [3], smoothness of the activation coefficients [17], and learned temporal dynamics [5,18,19]. In all these methods the model is expressed as the minimization of a cost with a data fitting term and some structure-promoting penalties.…”

Section: Introductionmentioning

confidence: 99%

Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement

Sprechmann

Bronstein

Sapiro

2014

2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA)

View full text Add to dashboard Cite

Traditionally, NMF algorithms consist of two separate stages: a training stage, in which a generative model is learned; and a testing stage in which the pre-learned model is used in a high level task such as enhancement, separation, or classification. As an alternative, we propose a task-supervised NMF method for the adaptation of the basis spectra learned in the first stage to enhance the performance on the specific task used in the second stage. We cast this problem as a bilevel optimization program that can be efficiently solved via stochastic gradient descent. The proposed approach is general enough to handle sparsity priors of the activations, and allow non-Euclidean data terms such as β-divergences. The framework is evaluated on single-channel speech enhancement tasks.

show abstract

Section: Introductionmentioning

confidence: 99%

Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement

Sprechmann

Bronstein

Sapiro

2014

2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA)

View full text Add to dashboard Cite

show abstract

“…With respect to previous work on combining NMF with LDS: while the methods of [25] [26] are able to provide a component activation matrix that is able to evolve smoothly over time, in the present work we are primarily interested in using the LDS in a supervised scenario, to provide a mapping between the observed 'noisy' output of an event detection system and the latent 'true' sound event output, which is not possible using the aforementioned methods.…”

Section: A Motivation and System Overviewmentioning

confidence: 99%

“…Recently, two NMF-based models were proposed for speech denoising and separation tasks, which incorporated temporal constraints similar to those of an LDS. In [25], an extension of NMF was proposed which supported Markovian dynamics: the observation model operates similarly to standard NMF, while the latent dynamics capture statistical dependencies between time frames similarly to LDS. In [26], a dynamic NMF model is proposed, where the observation model is similar to NMF/PLCA and follows a multinomial distribution, and the encoding matrix dynamics are formulated using an autoregressive model.…”

Section: B Linear Dynamical Systemsmentioning

confidence: 99%

Polyphonic Sound Event Tracking Using Linear Dynamical Systems

Benetos

Lafay

Lagrange

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In this paper, a system for polyphonic sound event detection and tracking is proposed, based on spectrogram factorisation techniques and state space models. The system extends probabilistic latent component analysis (PLCA) and is modelled around a 4-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state. In order to jointly track multiple overlapping sound events over time, the integration of linear dynamical systems (LDS) within the PLCA inference is proposed. The system assumes that the PLCA sound event activation is the (noisy) observation in an LDS, with the latent states corresponding to the true event activations. LDS training is achieved using fully observed data, making use of ground truth-informed event activations produced by the PLCA-based model. Several LDS variants are evaluated, using polyphonic datasets of office sounds generated from an acoustic scene simulator, as well as real and synthesized monophonic datasets for comparative purposes. Results show that the integration of LDS tracking within PLCA leads to an improvement of +8.5-10.5% in terms of frame-based F-measure as compared to the use of the PLCA model alone. In addition, the proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection.

show abstract

“…The general trade-off is that discrete-state approaches [4,5] can be more precise, especially in their temporal dynamics, whereas continuous approaches [6,7] can be more flexible with respect to gain and subspace variability.…”

Section: Introductionmentioning

confidence: 99%

“…Discrete state models, such as HMMs, represent dynamics using discrete state transitions over time [4,11]. Continuous state Gaussian dynamical models, such as linear dynamical systems (LDSs), have long been studied [12], and recently rich models of continuous dynamics have been extended to the NMF family using gammadistributed models [6,7] in models known as non-negative dynamical systems (NDSs). There have also been combinations with discrete dynamics and NMF observation models [13].…”

Section: Introductionmentioning

confidence: 99%

Non-negative source-filter dynamical system for speech enhancement

Şimşekli

Roux

Hershey

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Model-based speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, model-based methods have to focus on developing good speech models, whose quality will be key to their performance. In this study, we propose a novel probabilistic model for speech enhancement which precisely models the speech by taking into account the underlying speech production process as well as its dynamics. The proposed model follows a source-filter approach where the excitation and filter parts are modeled as non-negative dynamical systems. We present convergence-guaranteed update rules for each latent factor. In order to assess performance, we evaluate our model on a challenging speech enhancement task where the speech is observed under non-stationary noises recorded in a car. We show that our model outperforms state-of-the-art methods in terms of objective measures. ICASSP 2014This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Model-based speech enhancement methods, which rely on separately modeling the speech and the noise, have been shown to be powerful in many different problem settings. When the structure of the noise can be arbitrary, which is often the case in practice, modelbased methods have to focus on developing good speech models, whose quality will be key to their performance. In this study, we propose a novel probabilistic model for speech enhancement which precisely models the speech by taking into account the underlying speech production process as well as its dynamics. The proposed model follows a source-filter approach where the excitation and filter parts are modeled as non-negative dynamical systems. We present convergence-guaranteed update rules for each latent factor. In order to assess performance, we evaluate our model on a challenging speech enhancement task where the speech is observed under non-stationary noises recorded in a car. We show that our model outperforms state-of-the-art methods in terms of objective measures.

show abstract

Non-negative dynamical system with application to speech and audio

Cited by 43 publications

References 21 publications

Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement

Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement

Polyphonic Sound Event Tracking Using Linear Dynamical Systems

Non-negative source-filter dynamical system for speech enhancement

Contact Info

Product

Resources

About