2017
DOI: 10.12783/dtcse/cii2017/17273
|View full text |Cite
|
Sign up to set email alerts
|

Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model

Abstract: Speech Emotion Recognition (SER) plays an important role in human-computer interface and assistant technologies. In this paper, a new method is proposed using distributed Convolution Neural Networks (CNN) to automatically learn affect-salient features from raw spectral information, and then applying Bidirectional Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN. In the end, an Attention Mechanism is implemented on the output sequence of the BRNN to focus on target emoti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 14 publications
(13 reference statements)
0
7
0
Order By: Relevance
“…WA is the overall accuracy, calculated as the ratio of the total number of test data and the number of samples accurately predicted by the actual label. UA is calculated as the average of the recall values of four classes and is an important performance indicator in the evaluation of the SER model based on imbalanced datasets [ 19 , 20 , 26 ].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…WA is the overall accuracy, calculated as the ratio of the total number of test data and the number of samples accurately predicted by the actual label. UA is calculated as the average of the recall values of four classes and is an important performance indicator in the evaluation of the SER model based on imbalanced datasets [ 19 , 20 , 26 ].…”
Section: Discussionmentioning
confidence: 99%
“…Recent SER models based on deep-learning architectures [ 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 ] have demonstrated state-of-the-art performance with an attention mechanism [ 19 , 20 , 22 , 23 , 25 , 26 ]. The deep-learning architectures adopted in previous studies included recurrent neural networks (RNN) [ 19 ], convolutional neural networks (CNN) [ 24 ], and convolutional RNNs (CRNN) [ 20 , 26 ]. Liu et al [ 21 ] presented an SER model of a decision tree for an extreme learning machine having a single hidden-layer feed-forward neural network, using a mixture of deep learning and typical classification techniques.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The authors YAWEI MU et.al [10] proposed a new method using distributed Convolution Neural Networks (CNN) [11] to automatically learn affect-salient features from raw spectral information, and then applying Bidirectional Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN, but even with this method accuracy achieved is 64.08% .…”
Section: Cnn Classifier [5]mentioning
confidence: 99%
“…The experimental results showed the high performance of the proposed method in IEMOCAP (Busso et al, 2008 ) and CHEAVD (Li et al, 2017 ) dataset. Mu et al ( 2017 ) used distributed convolutional neural network (CNN) to automatically learn the emotion features from the raw speech spectrum, and they used bidirectional BRNN to obtain the time information from the CNN output. Finally, the output sequence of BRNN was weighted by attention mechanism algorithm to focus on the useful part of emotion.…”
Section: Introductionmentioning
confidence: 99%