This paper proposes support vector machine (SVM) based voice activity detection using FuzzyEn to improve detection performance under noisy conditions. The proposed voice activity detection (VAD) uses fuzzy entropy (FuzzyEn) as a feature extracted from noise-reduced speech signals to train an SVM model for speech/non-speech classification. The proposed VAD method was tested by conducting various experiments by adding real background noises of different signal-to-noise ratios (SNR) ranging from −10 dB to 10 dB to actual speech signals collected from the TIMIT database. The analysis proves that FuzzyEn feature shows better results in discriminating noise and corrupted noisy speech. The efficacy of the SVM classifier was validated using 10-fold cross validation. Furthermore, the results obtained by the proposed method was compared with those of previous standardized VAD algorithms as well as recently developed methods. Performance comparison suggests that the proposed method is proven to be more efficient in detecting speech under various noisy environments with an accuracy of 93.29%, and the FuzzyEn feature detects speech efficiently even at low SNR levels.
The real challenge in human-computer interaction is understanding human emotions by machines and responding to it accordingly. Emotion varies by gender and age of the speaker, location, and cause. This article focuses on the improvement of emotion recognition (ER) from speech using gender-biased influences in emotional expression. The problem is addressed by testing emotional speech with an appropriate specific-gender ER system. As acoustical characteristics vary among the genders, there may not be a common optimal feature set across both genders. Gender-based speech emotion recognition, a two-level hierarchical ER system is proposed, where the first level is gender identification which identifies the gender, and the second level is a gender-specific ER system, trained with an optimal feature set of expressions of a particular gender. The proposed system increases the accuracy of traditional Speech Emotion Recognition Systems (SER) by 10.36% than the SER trained with mixed gender training when tested on the EMO-DB Corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.