Vector Taylor Series (VTS) model based compensation approach has been successfully applied to various robust speech recognition tasks. In this paper, a novel method to derive the formula to calculate the static and dynamic statistics based on second-order VTS (sVTS) is presented, which provides a new insight on the VTS approximation. Lengthy derivation could therefore be avoided when high order VTS is used and the proposed approach is more compact and easier to implement compared to previous high order VTS approaches. Experiments on Aurora 4 showed that the proposed sVTS based model compensation approach obtained 16.7% relative WER reduction over traditional first-order VTS (fVTS) approach.
We investigate a novel approach to spatial filtering that is adaptive to conditions at different time-frequency (TF) points for noise removal by taking advantage of speech sparsity. Our approach combines a noise reduction beamformer with a minimum variance distortionless response (MVDR) beamformer or Generalized Eigenvalue (GEV) beamformer through TF posterior probabilities of speech presence (PPSP). To estimate PPSP, we study both statistical model-based and neural network based methods, where in the former, we use complex Gaussian mixture modeling (CGMM) on temporally augmented spatial spectral features, and in the latter, we use neural network (NN) based TF masks to initialize speech and noise covariance matrices in CGMM. We have conducted experiments on CHiME-3 task. On its real noisy speech test set, our methods of feature augmentation, TF dependent spatial filter, and NN-based mask initialization on covariances for CGMM have yielded relative word error rate (WER) reductions cumulatively by 8%, 16%, and 25% over the original CGMM based MVDR. On the real test data, the three methods have also produced consistent WER reductions when replacing MVDR by GEV.
Recently, Stochastic Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models. It optimizes the variational objective with stochastic optimization, following noisy estimates of the natural gradient. However, almost all the state-of-the-art SVI algorithms are based on first-order optimization algorithm and often suffer from poor convergence rate. In this paper, we bridge the gap between secondorder methods and stochastic variational inference by proposing a second-order based stochastic variational inference approach. In particular, firstly we derive the Hessian matrix of the variational objective. Then we devise two numerical schemes to implement second-order SVI efficiently. Thorough empirical evaluations are investigated on both synthetic and real dataset to backup both the effectiveness and efficiency of the proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.