Speech emotion recognition (SER) technology is significant for human–computer interaction, and this paper studies the features and modeling of SER. Mel-spectrogram is introduced and utilized as the feature of speech, and the theory and extraction process of mel-spectrogram are presented in detail. A deep residual shrinkage network with bi-directional gated recurrent unit (DRSN-BiGRU) is proposed in this paper, which is composed of convolution network, residual shrinkage network, bi-directional recurrent unit, and fully-connected network. Through the self-attention mechanism, DRSN-BiGRU can automatically ignore noisy information and improve the ability to learn effective features. Network optimization, verification experiment is carried out in three emotional datasets (CASIA, IEMOCAP, and MELD), and the accuracy of DRSN-BiGRU are 86.03%, 86.07%, and 70.57%, respectively. The results are also analyzed and compared with DCNN-LSTM, CNN-BiLSTM, and DRN-BiGRU, which verified the superior performance of DRSN-BiGRU.
Text emotion recognition (TER) is an important natural language processing (NLP) task which is widely used in human–computer interaction, public opinion analysis, mental health analysis, and social network analysis. In this paper, a deep learning model based on XLNet with bidirectional recurrent unit and attention mechanism (XLNet-BiGRU-Att) is proposed in order to improve TER performance. XLNet is used to build bidirectional language models which can learn contextual information simultaneously, while the bidirectional gated recurrent unit (BiGRU) helps to extract more effective features which can pay attention to current and previous states using hidden layers and the attention mechanism (Att) provides different weights to enhance the ’attention’ paid to important information, thereby improving the quality of word vectors and the accuracy of sentiment analysis model judgments. The proposed model composed of XLNet, BiGRU, and Att improves performance on the whole TER task. Experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database and the Chinese Academy of Sciences Institute of Automation (CASIA) dataset were carried out to compare XLNet-BiGRU-Att, XLNet, BERT, and BERT-BiLSTM, and the results show that the model proposed in this paper has superior performance compared to the others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.