2011 17th International Conference on Digital Signal Processing (DSP) 2011
DOI: 10.1109/icdsp.2011.6004924
|View full text |Cite
|
Sign up to set email alerts
|

Single channel speech music separation using nonnegative matrix factorization and spectral masks

Abstract: A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to train a set of basis vectors for each source. These bases are trained using NMF in the magnitude spectrum domain. After observing the mixed signal, NMF is used… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
53
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 71 publications
(53 citation statements)
references
References 7 publications
0
53
0
Order By: Relevance
“…Weights for basis vectors appear in corresponding columns in matrix W. To approximate data in V as a non-negative linear combination of its component vectors, the non-negative basis vectors in matrix B are optimized. The matrices B and W are estimated by solving following optimization problem [7]:…”
Section: A Training Of Speech and Musicmentioning
confidence: 99%
See 2 more Smart Citations
“…Weights for basis vectors appear in corresponding columns in matrix W. To approximate data in V as a non-negative linear combination of its component vectors, the non-negative basis vectors in matrix B are optimized. The matrices B and W are estimated by solving following optimization problem [7]:…”
Section: A Training Of Speech and Musicmentioning
confidence: 99%
“…With two sets of training data for speech and music signals, the Fast Fourier Transform (FFT) is computed for each signal to obtain magnitude spectrogram of speech and music signals. Then, NMF is used for decompose speech and music spectrograms into base and weight matrices In other word, the aim of using NMF is to model the training data as a set of basis vectors to represent the spectral characteristics for each source signal [7]. ≈ ℎ ℎ…”
Section: A Training Of Speech and Musicmentioning
confidence: 99%
See 1 more Smart Citation
“…For comparison to the DNN approach, an equivalent non-negative matrix factorization (NMF) based approach was implemented using the same training and test data (as described above). We used the same unpacking strategy, which has been tested before for NMF-based separation of speech and music [13]. The spectrograms of the training data were sampled and unpacked analogously to the DNN approach, resulting in a large (220500x15000) matrix that was then decomposed using the traditional multiplicative updates algorithm with KL divergence [14].…”
Section: Methodsmentioning
confidence: 99%
“…For the testing stage, we concatenated both matrices and initialized a corresponding H u matrix randomly, so that for each unpacked spectrogram, V u , of the set of test songs, V u = [W v W nv ] H u . We then ran the same multiplicative updates algorithm but keeping the composite W u matrix fixed [13], and updating H u . The test spectrogram was then re-composed for either vocal (V v = W v H v ) or non-vocal (V nv = W nv H nv ) vectors, and used to define a soft mask via the element-wise division…”
Section: Methodsmentioning
confidence: 99%