Cepstral normalization has widely been used as a powerful approach to produce robust features for speech recognition. Good examples of this approach include Cepstral Mean Subtraction, and Cepstral Mean and Variance Normalization, in which either the first or both the first and the second moments of the Mel-frequency Cepstral Coefficients (MFCCs) are normalized. In this chapter, we propose the family of Higher Order Cepstral Moment Normalization, in which the MFCC parameters are normalized with respect to a few moments of orders higher than 1 or 2. The basic idea is that the higher order moments are more dominated by samples with larger values, which are very likely the primary sources of the asymmetry and abnormal flatness or tail size of the parameter distributions. Normalization with respect to these moments therefore puts more emphasis on these signal components and constrains the distributions to be more symmetric with more reasonable flatness and tail size. The fundamental principles behind this approach are also analyzed and discussed based on the statistical properties of the distributions of the MFCC parameters. Experimental results based on the AURORA 2, AURORA 3, AURORA 4 testing environments show that with the proposed approach, recognition accuracy can be significantly and consistently improved for all types of noise and all SNR conditions.
Two approaches for modulation spectrum equalization are proposed for robust feature extraction in speech recognition. In both cases the temporal trajectories of the feature parameters are first transformed into the modulation spectrum. In the spectral histogram equalization (SHE) approach, we equalize the histogram of the modulation spectrum for each utterance to a reference histogram obtained from clean training data. In the magnitude ratio equalization (MRE) approach, we equalize the magnitude ratio of lower to higher frequency components on the modulation spectrum to a reference value also obtained from clean training data. Preliminary experimental results performed on the AURORA 2 testing environment indicate that significant performance improvements are achievable with these approaches, when integrated with cepstral mean and variance normalization (CMVN), for all testing sets A, B, and C, all types of noise, for all SNR values. We also show that the approach of magnitude ratio equalization (MRE) offers additional performance improvements when integrated with other more advanced feature normalization approaches such as histogram equalization (HEQ) and higher-order cepstral moment normalization (HOCMN).
This E-book is a collection of articles that describe advances in speech recognition technology. Robustness in speech recognition refers to the need to maintain high speech recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulate, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources. Although progress over the past decade has been impressive, there are significant obstacles to overcome before speech recognition systems can reach their full potential. Automatic speech recognition (ASR) systems must be robust to all levels, so that they can handle background or channel noise, the occurrence on unfamiliar words, new accents, new users, or unanticipated inputs. They must exhibit more 'intelligence' and integrate speech with other modalities, deriving the user's intent by combining speech with facial expressions, eye movements, gestures, and other input features, and communicating back to the user through multimedia responses. Therefore, as speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition becomes increasingly significant. This E-book should be useful to computer engineers interested in recent developments in speech recognition technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.