This paper deals with the influence of pre-whitening for the task of fundamental frequency estimation in noisy conditions. Parametric fundamental frequency estimators commonly assume that the noise is white and Gaussian and, therefore, they are only statistically efficient under those conditions. The noise is coloured in many practical applications and this will often result in problems of misidentifying an integer divisor or multiple of the true fundamental frequency (i.e., octave errors). The purpose of this paper is to see if pre-whitening can reduce this problem, based on noise statistics obtained from existing noise PSD estimation algorithms. For this purpose, different noise types and prediction orders of LPC pre-whitening are considered. The results show that pre-whitening improves significantly the estimation accuracy of an NLS pitch estimator when the noise is fairly stationary. For nonstationary noise, the improvements are modest at best, but we hypothesize that this is due to the noise PSD estimation performance rather than the LPC pre-whitening principle.
Several speech processing methods assume that a clean signal is observed in white Gaussian noise (WGN). An argument against those methods is that the WGN assumption is not valid in many real acoustic scenarios. To take into account the coloured nature of the noise, a pre-whitening filter which renders the background noise closer to white can be applied. This paper introduces an adaptive pre-whitener based on a supervised non-negative matrix factorization (NMF), in which a pre-trained dictionary includes parametrized spectral information about the noise and speech sources in the form of autoregressive (AR) coefficients. Results show that the noise can get closer to white, in comparison to pre-whiteners based on conventional noise power spectral density (PSD) estimates such as minimum statistics and MMSE. A better pitch estimation accuracy can be achieved as well. Speech enhancement based on the WGN assumption shows a similar performance to the conventional enhancement which makes use of the background noise PSD estimate, which reveals that the proposed pre-whitener can preserve the signal of interest.
Most parametric fundamental frequency estimators make the implicit assumption that any corrupting noise is additive, white Gaussian. Under this assumption, the maximum likelihood (ML) and the least squares estimators are the same, and statistically efficient. However, in the coloured noise case, the estimators differ, and the spectral shape of the corrupting noise should be taken into account. To allow for this, we here propose two schemes that refine the noise statistics and parameter estimates in an iterative manner, one of them based on an approximate ML solution and the other one based on removing the periodic signal obtained from a linearly constrained minimum variance (LCMV) filter. Evaluations on real speech data indicate that the iteration steps improve the estimation accuracy, therefore offering improvement over traditional non-parametric fundamental frequency methods in most of the evaluated scenarios.
Optimal linear filtering has been used extensively for speech enhancement. In this paper, we take a first step in trying to apply linear filtering to the decomposition of a noisy speech signal into its components. The problem of decomposing speech into its voiced and unvoiced components is considered as an estimation problem. Assuming a harmonic model for the voiced speech, we propose a Wiener filtering scheme which estimates both components separately in the presence of noise. It is shown under which conditions this optimal filtering formulation outperforms two state-of-the-art speech decomposition methods, which is also revealed by objective measures, spectrograms and informal listening tests. Index Terms-Speech decomposition, time-domain filtering, Wiener filter, voiced speech, unvoiced speech.
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.-Users may download and print one copy of any publication from the public portal for the purpose of private study or research. -You may not further distribute the material or use it for any profit-making activity or commercial gain -You may freely distribute the URL identifying the publication in the public portal -
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.