In human-machine interaction systems, speech emotion recognition plays a key role. Recognition of categorical emotions has made a great improvement during the last few decades, but emotion recognition of spontaneous speech is still very challenging. This paper aims to investigate emotion recognition from the spontaneous speech in the three-dimensional model. Each dimension represents one primitive, generic attribute of an emotion. Middle levels of each dimension were introduced in this paper. LSTM network was employed to estimate the dimensions due to its effectiveness in speech emotion recognition. In the experiments, we use the IEMOCAP database and the accuracy is 30-35%. The confusion matrixes show that our method leads to a more concentrated dimension location. Furthermore, dimensions were applied in categorical emotion recognition. This indicates that increasing dimension levels could provide a possibility of dimension estimation, and suggests that it is possible to promote speech emotion recognition with dimensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.