We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algorithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use a voice activity detector. Instead it tracks spectral minima in each frequency band without any distinction between speech activity and speech pause. By minimizing a conditional mean square estimation error criterion in each time step we derive the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal. Based on the optimally smoothed power spectral density estimate and the analysis of the statistics of spectral minima an unbiased noise estimator is developed. The estimator is well suited for real time implementations. Furthermore, to improve the performance in nonstationary noise we introduce a method to speed up the tracking of the spectral minima. Finally, we evaluate the proposed method in the context of speech enhancement and low bit rate speech coding with various noise types.
The enhancement of short-term spectra of noisy speech can be achieved by statistical estimation of the clean speech spectral components. We present a minimum mean-square error estimator of the clean speech spectral magnitude that uses both a parametric compression function in the estimation error criterion and a parametric prior distribution for the statistical model of the clean speech magnitude. The novel parametric estimator has many known magnitude estimators as a special solution and, additionally, affords estimators that combine the beneficial properties of different known solutions. The new estimator is evaluated in terms of segmental SNR, speech distortion, and noise suppression.
This paper presents and compares algorithms for combined acoustic echo cancellation and noise reduction for hands-free telephones. A structure is proposed, consisting of a conventional acoustic echo canceler and a frequency domain postfilter in the sending path of the hands-free system. The postfilter applies the spectral weighting technique and attenuates both the background noise and the residual echo which remains after imperfect echo cancellation. Two weighting rules for the postfilter are discussed. The first is a conventional one, known from noise reduction, which is extended to attenuate residual echo as well as noise. The second is a psychoacoustically motivated weighting rule. Both rules are evaluated and compared by instrumental and auditive tests. They succeed about equally well in attenuating the noise and the residual echo. In listening tests, however, the psychoacoustically motivated weighting rule is mostly preferred since it leads to more natural near end speech and to less annoying residual noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.