In this paper we present a method to derive Mel-frequency cepstral coefficients directly from the power spectrum of a speech signal. We show that omitting the filterbank in signal analysis does not affect the word error rate. The presented approach simplifies the speech recognizer's front end by merging subsequent signal analysis steps into a single one. It avoids possible interpolation and discretization problems and results in a compact implementation. We show that frequency warping schemes like vocal tract normalization (VTN) can be integrated easily in our concept without additional computational efforts. Recognition test results obtained with the RWTH large vocabulary speech recognition system are presented for two different corpora: The German VerbMobil II dev99 corpus, and the English North American Business News 94 20k development corpus.
In this paper we present di erent approaches to structuring covariance matrices within statistical classi ers. This is motivated by the fact that the use of full covariance matrices is infeasible in many applications. On the one hand, this is due to the high number of model parameters that have to be estimated, on the other hand the computational complexity of a classi er based on full covariance matrices is very high. We propose the use of diagonal and band-matrices to replace full covariance matrices and we also show that computation of tangent distance is equivalent to using a structured covariance matrix within a statistical classi er.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.