2021
DOI: 10.3390/s21041249
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

Abstract: The advancements in neural networks and the on-demand need for accurate and near real-time Speech Emotion Recognition (SER) in human–computer interactions make it mandatory to compare available methods and databases in SER to achieve feasible solutions and a firmer understanding of this open-ended problem. The current study reviews deep learning approaches for SER with available datasets, followed by conventional machine learning techniques for speech emotion recognition. Ultimately, we present a multi-aspect … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
54
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 192 publications
(73 citation statements)
references
References 85 publications
0
54
0
Order By: Relevance
“…The aforementioned observation was proven by several researchers [3][4][5][6][7][8][9][10][11][12][13][14]. DL is beneficial in other fields, including target recognition [15], speech recognition [16,17], image recognition [18][19][20], image restoration [21][22][23], audio classification [24,25], object detection [26][27][28][29][30], scene recognition [31], etc., but it has been considered "bad news" in text-based CAPTCHAs, by penetrating their security and making them vulnerable.…”
Section: Introductionmentioning
confidence: 98%
“…The aforementioned observation was proven by several researchers [3][4][5][6][7][8][9][10][11][12][13][14]. DL is beneficial in other fields, including target recognition [15], speech recognition [16,17], image recognition [18][19][20], image restoration [21][22][23], audio classification [24,25], object detection [26][27][28][29][30], scene recognition [31], etc., but it has been considered "bad news" in text-based CAPTCHAs, by penetrating their security and making them vulnerable.…”
Section: Introductionmentioning
confidence: 98%
“…In [ 39 ], a CNN-based architecture was tested on the acted emotional speech dynamic database (AESDD) by analyzing the sequential time frames and its performance surpassed some existing techniques, which rely on handcrafted features. Very recently, Abbaschian et al [ 40 ] comprehensively and systematically reviewed major deep learning approaches employed in the speech emotion recognition research, combined with their associated speech databases. This review indicates that CNN-based approaches play an important role in speech emotion recognition.…”
Section: Introductionmentioning
confidence: 99%
“…While affective computing has received more attention in humans due to its widespread implications for e.g. neuromarketing ( 5 ), entertainment ( 5 ), monitoring mental health ( 6 , 7 ) and human-robot interactions ( 8 ), the application of affective computing on farm animal welfare is however in its infancy and therefore, more multidisciplinary and exploratory studies are needed to further develop this highly promising field.…”
Section: Introductionmentioning
confidence: 99%
“…Human affective research is only just starting to understand the importance of multimodal approaches, and this knowledge should be used to develop models for non-human animals, too. Research on humans have resulted in complex systems that allow accurate sentiment analysis, preference detection ( 5 ) using qualitative and quantitative data such as facial expressions, body gestures, phonetic and acoustic properties of spoken language, word use and grammar in written text and more ( 8 , 68 ). Different algorithms have been designed using computational methodologies such as hidden Markov chains, Bayesian networks and Gaussian mixture models to e.g.…”
Section: Introductionmentioning
confidence: 99%