2021
DOI: 10.3390/electronics10101163
|View full text |Cite
|
Sign up to set email alerts
|

A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism

Abstract: Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The att… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 114 publications
(67 citation statements)
references
References 113 publications
0
37
0
Order By: Relevance
“…Recent tremendous results in speech emotion recognition (SER) have been focused on the utilizations of deep learning and convolutional networks [7], [8], [9], [10], [11], [12]. The task is also investigated in Arabic speech emotion recognition (ASER) in several recent results [13], [14], [15].…”
Section: Related Workmentioning
confidence: 99%
“…Recent tremendous results in speech emotion recognition (SER) have been focused on the utilizations of deep learning and convolutional networks [7], [8], [9], [10], [11], [12]. The task is also investigated in Arabic speech emotion recognition (ASER) in several recent results [13], [14], [15].…”
Section: Related Workmentioning
confidence: 99%
“…Overall, the precision, recall, and f1-score values obtained were very similar to the recognition accuracy as we can see from Tables 5-11, and the AUC values were also quite close to 1. Tables [6][7][8][9][10][11] show that the common point of CNN, CRNN, and GRU models were that the highest precision, recall, and f1-score were achieved with the "sadness" emotion, and the lowest recall and f1-score were for the "happiness" ("excitement") emotion and for both sets of parameters. The lowest precision was for the emotions of "excitement" or "anger."…”
Section: Resultsmentioning
confidence: 99%
“…The research in [7] has surveyed and evaluated quite a significant number of studies on speech emotion recognition for different corpuses including IEMOCAP [8]. IEMOCAP was a corpus collected by the Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California (USC).…”
Section: Related Workmentioning
confidence: 99%
“…They are now also used in speaker recognition [29]. The approaches that have been successfully applied in speaker recognition are often adopted in emotion recognition (see e.g., [30][31][32]).…”
Section: System Architecturementioning
confidence: 99%