The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of visual speech elements is presented. The approach makes use of adaptive boosting (AdaBoost) and hidden Markov models (HMMs) to build an AdaBoost-HMM classifier. The composite HMMs of the AdaBoost-HMM classifier are trained to cover different groups of training samples using the AdaBoost technique and the biased Baum-Welch training method. By combining the decisions of the component classifiers of the composite HMMs according to a novel probability synthesis rule, a more complex decision boundary is formulated than using the single HMM classifier. The method is applied to the recognition of the basic visual speech elements. Experimental results show that the AdaBoost-HMM classifier outperforms the traditional HMM classifier in accuracy, especially for visemes extracted from contexts.
The effective length of a filter designed using the frequency-response masking (FRM) technique is very long and requires a very large number of delay elements. In this paper, we present some useful techniques for reducing the data transfer between the field programmable gate array (FPGA) and external memory when the random logic is implemented using the FPGA and the delay elements are implemented using an external memory such as dynamic random access memory.
In this paper, a novel system for detection of human stress and emotion in speech is proposed. The system makes use of FFT based linear short time Log Frequency Power Coefficients (LFPC) and TEO based nonlinear LFPC features in both time and frequency domains. The performance of the proposed system is compared with the traditional approaches which use features of LPCC and MFCC. The comparison of each approach is performed using SUSAS (Speech Under Simulated and Actual Stress) and ESMBS (Emotional Speech of Mandarin and Burmese Speakers) databases. It is observed that proposed system outperforms the traditional systems. Results show that, the system using LFPC gives the highest accuracy (87.8% for stress, 89.2% for emotion classification) followed by the system using NFD-LFPC feature. While the system using NTD-LFPC feature gives the lowest accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.