Abstract:Desirable properties of real-world speech enhancement methods include online operation, single-channel operation, operation in the presence of a variety of noise types including non-stationary noise, and no requirement for isolated training examples of the specific speaker and noise type at hand. Methods in the literature typically possess only a subset of these properties. Source separation methods particularly rarely simultaneously possess the first and last properties. We extend universal speech model-based… Show more
“…1 is only describing a basic separation system to help focus on the selection of the divergence cost function to be used under the sparse and lowrank framework. The obtained performance can, however, be further improved through techniques such as adopting a universal speech dictionary [23], imposing temporal continuity to the sparse matrix [24], using an information fusion strategy [25], or a combination with autocorrelation [26]. The use of these techniques for performance improvement is beyond the scope of this paper, and will be explored in future works.…”
This paper addresses the problem of unsupervised speech separation based on robust non‐negative matrix factorization (RNMF) with β‐divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non‐negative matrix factorization, inspired by the recently developed sparse and low‐rank decomposition, in which the data matrix is decomposed into the sum of a low‐rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the β‐divergence‐based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non‐negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time‐varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training.
“…1 is only describing a basic separation system to help focus on the selection of the divergence cost function to be used under the sparse and lowrank framework. The obtained performance can, however, be further improved through techniques such as adopting a universal speech dictionary [23], imposing temporal continuity to the sparse matrix [24], using an information fusion strategy [25], or a combination with autocorrelation [26]. The use of these techniques for performance improvement is beyond the scope of this paper, and will be explored in future works.…”
This paper addresses the problem of unsupervised speech separation based on robust non‐negative matrix factorization (RNMF) with β‐divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non‐negative matrix factorization, inspired by the recently developed sparse and low‐rank decomposition, in which the data matrix is decomposed into the sum of a low‐rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the β‐divergence‐based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non‐negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time‐varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training.
This paper introduces a constrained source/filter model for semisupervised speech separation based on non-negative matrix factorization (NMF). The objective is to inform NMF with prior knowledge about speech, providing a physically meaningful speech separation. To do so, a source/filter model (indicated as Instantaneous Mixture Model or IMM) is integrated in the NMF. Furthermore, constraints are added to the IMM-NMF, in order to control the NMF behaviour during separation, and to enforce its physical meaning. In particular, a speech specific constraint-based on the source/filter coherence of speech-and a method for the automatic adaptation of constraints' weights during separation are presented. Also, the proposed source/filter model is semi-supervised: during training, one filter basis is estimated for each phoneme of a speaker; during separation, the estimated filter bases are then used in the constrained source/filter model. An experimental evaluation for speech separation was conducted on the TIMIT speakers database mixed with various environmental background noises from the QUT-NOISE database. This evaluation showed that the use of adaptive constraints increases the performance of the source/filter model for speaker-dependent speech separation, and compares favorably to fully-supervised speech separation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.