Feature selection is one of the important aspects that contribute most to the emotion recognition system performance apart from the database and the classification technique used. Based on the previous finding, Mel Frequency Cepstral Coefficients (MFCC) are said to be good for emotion recognition purpose. This paper discusses the use of MFCC features to recognize human emotion on Berlin database in the German language. Global features are extracted from MFCC and tested with three classification methods; Naive Bayes, Artificial Neural Network (ANN) and Support Vector Machine (SVM). We investigate the capabilities of MFCC global features using 13, 26 and 39-dimensional cepstral features in recognizing emotions from speech. The result from the experiment will be further discussed in this paper.
Research works on combining emotions in intelligent machines are expanding and improving. Human’s speeches basically have various emotional states. The finding of reliable speech features is an ongoing research. Specific features in the speech signal that contribute to emotional information are uncertain, extremely challenging problem and continue being explored. The recognition rate of emotion in speech signal is inconsistent depending on the features used in the experiment and also the database itself. Prosodic, spectral and wavelet features are mostly being used to determine which of these features or its hybrid carry more information about emotions. This paper intends to summarize previous work and make reviews about single and hybrid features based on prosodic, spectral and wavelet feature.
The accuracy of human emotional detection is crucial in the industry to ensure effective conversations and messages delivery. The process involved in identifying emotions must be carried out properly and using a method that guarantees high level of emotional recognition. Energy feature is said to be a prosodic information encoder and there are still studies on energy use in speech prosody and it motivate us to run an experiment on energy features. We have conducted two sets of studies: 1) whether local or global features that contribute most to emotional recognition and 2) the effect of the end-part segment length towards emotion recognition accuracy using 2 types of segmentation approach. This paper discussed about Absolute Time Intervals at Relative Positions (ATIR) segmentation approach and global ATIR (GATIR) using end-part segmented global energy feature extracted from Berlin Emotional Speech Database (EMO-DB). We observed that global feature contribute more to the emotional recognition and global features that are derived from longer segments give higher recognition accuracy than global feature derived from short segments. The addition of utterance-based feature (GTI) to ATIR segmentation somewhat contributes to increase the accuracy by 5% up to 8% and conclude that GATIR outperformed ATIR segmentation approached in term of its higher recognition rate. The results of this study where almost all the sub-tests provide an increased result proving that global feature derived from longer segment lengths acquire more emotional information and enhance the system performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.