Richard C. Hendriks scite author profile

Abstract-Recently, it has been proposed to estimate the noise power spectral density by means of minimum mean-square error (MMSE) optimal estimation. We show that the resulting estimator can be interpreted as a voice activity detector (VAD)-based noise power estimator, where the noise power is updated only when speech absence is signaled, compensated with a required bias compensation. We show that the bias compensation is unnecessary when we replace the VAD by a soft speech presence probability (SPP) with fixed priors. Choosing fixed priors also has the benefit of decoupling the noise power estimator from subsequent steps in a speech enhancement framework, such as the estimation of the speech power and the estimation of the clean speech. We show that the proposed SPP approach maintains the quick noise tracking performance of the bias compensated MMSE-based approach while exhibiting less overestimation of the spectral noise power and an even lower computational complexity.Index Terms-Noise power estimation, speech enhancement.

show abstract

A short-time objective intelligibility measure for time-frequency weighted noisy speech

Taal

et al. 2010

View full text Add to dashboard Cite

Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate for methods where noisy speech is processed by a timefrequency (TF) weighting, e.g., noise reduction and speech separation. In this paper, we present an objective intelligibility measure, which shows high correlation (rho=0.95) with the intelligibility of both noisy, and TF-weighted noisy speech. The proposed method shows significantly better performance than three other, more sophisticated, objective measures. Furthermore, it is based on an intermediate intelligibility measure for short-time (approximately 400 ms) TF-regions, and uses a simple DFT-based TF-decomposition. In addition, a free Matlab implementation is provided.Index Terms-intelligibility prediction, speech enhancement, noisy speech.

show abstract

Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors

Erkelens

Hendriks

Heusdens

et al. 2007

IEEE Trans. Audio Speech Lang. Process.

228

199

View full text Add to dashboard Cite

MMSE based noise PSD tracking with low complexity

2010

View full text Add to dashboard Cite

Noise power estimation based on the probability of speech presence

2011

View full text Add to dashboard Cite

In this paper, we analyze the minimum mean square error (MMSE) based spectral noise power estimator [1] and present an improvement. We will show that the MMSE based spectral noise power estimate is only updated when the a posteriori signal-to-noise ratio (SNR) is lower than one. This threshold on the a posteriori SNR can be interpreted as a voice activity detector (VAD).We propose in this work to replace the hard decision of the VAD by a soft speech presence probability (SPP). We show that by doing so, the proposed estimator does not require a bias correction and safety-net as is required by the MMSE estimator presented in [1]. At the same time, the proposed estimator maintains the quick noise tracking capability which is characteristic for the MMSE noise tracker, results in less noise power overestimation and is computationally less expensive.

show abstract

DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement: A Survey of the State of the Art

Hendriks¹,

Gerkmann²,

Jensen³

2013

Synthesis Lectures on Speech and Audio Processing

100

View full text Add to dashboard Cite

Noise Tracking Using DFT Domain Subspace Decompositions

Hendriks

Jensen

Heusdens

2008

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-All discrete Fourier transform (DFT) domain-based speech enhancement gain functions rely on knowledge of the noise power spectral density (PSD). Since the noise PSD is unknown in advance, estimation from the noisy speech signal is necessary. An overestimation of the noise PSD will lead to a loss in speech quality, while an underestimation will lead to an unnecessary high level of residual noise. We present a novel approach for noise tracking, which updates the noise PSD for each DFT coefficient in the presence of both speech and noise. This method is based on the eigenvalue decomposition of correlation matrices that are constructed from time series of noisy DFT coefficients. The presented method is very well capable of tracking gradually changing noise types. In comparison to state-of-the-art noise tracking algorithms the proposed method reduces the estimation error between the estimated and the true noise PSD. In combination with an enhancement system the proposed method improves the segmental SNR with several decibels for gradually changing noise types. Listening experiments show that the proposed system is preferred over the state-of-the-art noise tracking algorithm.Index Terms-Discrete Fourier transform (DFT) domain subspace decompositions, noise tracking, speech enhancement.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.