Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Mohammadiha, Nasser; Smaragdis, Paris; Leijon, Arne

doi:10.1109/tasl.2013.2270369

Cited by 365 publications

(227 citation statements)

References 40 publications

Supporting

Mentioning

221

Contrasting

Unclassified

Order By: Relevance

“…Unlike traditional speech enhancement techniques (e.g., [1][2][3][4][5]), which focus on dealing with the noise-corrupted speech signal (i.e., speech-plus-noise mixture) and on removing background noise from the signal to achieve better listening experiences for listeners, these speech modification algorithms aim to alter the original clean speech signal so that the intelligibility may be preserved even when listened to in non-ideal listening conditions, in which background masking sources may exist. While the majority of modification algorithms operate in the frequency domain, such as enhancing frequency components which are important to speech intelligibility in noise [6][7][8] and boosting certain spectral regions based on optimising objective intelligibility metrics [9][10][11][12], other algorithms make changes in the time domain, including introducing pauses into speech and speeding up or slowing down part of the speech to avoid a temporal clash between the speech and masker [10,13].…”

Section: Introductionmentioning

confidence: 99%

A Study on the Relationship between the Intelligibility and Quality of Algorithmically-Modified Speech for Normal Hearing Listeners

Tang

Arnold

Cox

2017

JOHBM

View full text Add to dashboard Cite

This study investigates the relationship between the intelligibility and quality of modified speech in noise and in quiet. Speech signals were processed by seven algorithms designed to increase speech intelligibility in noise without altering speech intensity. In three noise maskers, including both stationary and fluctuating noise at two signal-to-noise ratios (SNR), listeners identified keywords from unmodified or modified sentences. The intelligibility performance of each type of speech was measured as the listeners' word recognition rate in each condition, while the quality was rated as a mean opinion score. In quiet, only the perceptual quality of each type of speech was assessed. The results suggest that when listening in noise, modification performance on improving intelligibility is more important than its potential negative impact on speech quality. However, when listening in quiet or at SNRs in which intelligibility is no longer an issue to listeners, the impact to speech quality due to modification becomes a concern.

show abstract

Section: Introductionmentioning

confidence: 99%

A Study on the Relationship between the Intelligibility and Quality of Algorithmically-Modified Speech for Normal Hearing Listeners

Tang

Arnold

Cox

2017

JOHBM

View full text Add to dashboard Cite

show abstract

“…(7) corresponds to a non-negative matrix factorization (NMF) model placed on the F × L matrix of variances of the source coefficients; a now common practice in audio signal processing, e.g. [16,17,18].…”

Section: The Source Modelmentioning

confidence: 99%

An EM algorithm for joint source separation and diarisation of multichannel convolutive speech mixtures

Kounades-Bastian

Girin

Alameda-Pineda

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We present a probabilistic model for joint source separation and diarisation of multichannel convolutive speech mixtures. We build upon the framework of local Gaussian model (LGM) with non-negative matrix factorization (NMF). The diarisation is introduced as a temporal labeling of each source in the mix as active or inactive at the short-term frame level. We devise an EM algorithm in which the source separation process is aided by the diarisation state, since the latter indicates the sources actually present in the mixture. The diarisation state is tracked with a Hidden Markov Model (HMM) with emission probabilities calculated from the estimated source signals. The proposed EM has separation performance comparable with a state-of-the-art LGM NMF method, while outperforming a state-of-the-art speaker diarisation pipeline.

show abstract

“…Statistical properties based approaches like minimum mean squared error (MMSE) estimation and optimally-modified log-spectral amplitude (OM-LSA) which could take human hearing properties into account and reduce speech distortion and residual noise in some extent [1] . Recent years, supervised learning methods have achieved significant development in speech signal processing [2][3] . As a famous method which could mine implicit local representation in non negative data, non-negative matrix factorization (NMF) uses no-negative linear combination to separate clean and noise signal from noisy speech.…”

Section: Introductionmentioning

confidence: 99%

Improved Sparse NMF based Speech Enhancement Method with Deep Neural Network

Zou¹,

Zhang²,

Shi³

et al. 2018

Proceedings of the 2nd International Forum on Management, Education and Information Technology Application (IFMEITA 2017)

View full text Add to dashboard Cite

Abstract. Considering the sparsity characteristic of speech signal in time-frequency domain and the non-linear model ability of deep neural network, an improved sparse non-negative matrix factorization based speech enhancement method is presented in this paper. Deep neural network is employed to learn the sparse encoding coefficients of speech and noise from noisy observation. The estimated clean speech is obtained by applying the wiener filter on the magnitude spectrogram of noisy speech. The experimental results show the superiority of proposed method under stationary and non-stationary conditions.

show abstract

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Cited by 365 publications

References 40 publications

A Study on the Relationship between the Intelligibility and Quality of Algorithmically-Modified Speech for Normal Hearing Listeners

A Study on the Relationship between the Intelligibility and Quality of Algorithmically-Modified Speech for Normal Hearing Listeners

An EM algorithm for joint source separation and diarisation of multichannel convolutive speech mixtures

Improved Sparse NMF based Speech Enhancement Method with Deep Neural Network

Contact Info

Product

Resources

About