Although linear filters are useful in a various applications in the context of speech processing, there are several evidences for existence of nonlinearity in speech signals. Our main aim is to launch a comprehensive investigation into the exploitation of nonlinear Volterra filters in the context of the ADPCM-based speech coding technique, using two methods of forward prediction, based on the LS criterion, and backward prediction, based on both LMS and RLS adaptation algorithms. In any case, after solving some innate problems, for example, ill-conditioning and instability, schemes for optimum exploitation of nonlinear prediction are developed and simulation results are provided, tested with several performance criteria. With forward prediction a scheme is developed to detect and flag those frames for which, after stabilizing, including the quadratic predictor is beneficial. Scalar and vector quantisation methods are used for quantising the residual signal and the filter parameters, respectively. The results show that using this scheme a negligible improvement (up to 0.62 dB in the SNR) can be achieved, in spite of the increase in bit rate and complexity. With backward prediction two frame-based schemes are developed in which for each frame, after examining a set of quadratic filters, the best filter in the sense of the best quality of the reconstructed speech is selected. The ultimate schemes result in an improvement of up to 1.5 dB in the overall SNR of the reconstructed speech at the cost of a slight increase in the bit-rate, a short delay and a demanding increase in the complexity.
A new and effective algorithm is proposed in this paper based on Gaussian Mixture Modelling (GMM) and Minimum Mean Square Error (MMSE) criterion for speech enhancement where no assumption is made on the nature or stationarity of the noise. No Voice Activity Detection (VAD) or any other means is used to estimate the input Signal to Noise Ratio (SNR). The mean vectors of the mixture models of spectral magnitudes derived from models of speech and different noise sources power spectra are used to form sets of over-determined system of equations, as many as noise source candidates, whose solutions lead to the MMSE estimations of speech and additive noise spectral magnitudes. The corresponding power spectra are then used for noise suppression by applying Wiener filtering carried out on overlapping frames. The input SNR is estimated and the nature of the noise involved is determined as by-products of the method used. Results are compared with codebook constrained methods that have shown very good results but suffer from long processing times. It is shown that, at the cost of a slight lower improvement in SNR and PESQ score, the new algorithm reduces the computation time to one fifth which makes it suitable for practical applications. (Abstract)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.