The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Costantini, Giovanni; Parada-Cabaleiro, Emilia; Casali, Daniele; Cesarini, Valerio

doi:10.3390/s22072461

Cited by 27 publications

(20 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First of all, traditional ML algorithms could provide reliable results in the case of small-to-medium size of training data [51][52][53]. Second, SVM and MLP classifiers often demonstrate better performance than others [22,23,54]. Some experiments have shown a supercity of SVM and MLP for emotion recognition over classical Random Forest, K-NN, etc.…”

Section: Classifiersmentioning

confidence: 99%

“…MLP is the "basic" example of NN. It is stated in [22] that MLP is the most effective speech emotion classifier, with accuracies higher than 90% for single-language approaches, followed closely by SVM. The results show that MLP outperforms SVM in overall emotion classification performance, and even though SVM training is faster compared to MLP, the ultimate accuracy of MLP is higher than that of SVM [57].…”

Section: Classifiersmentioning

confidence: 99%

“…It is noted in [22] that neural networks and Support Vector Machine (SVM) behave differently, and each have advantages and disadvantages for building a SER; however, both are relevant for the task. Multi-Layer Perceptron (MLP) is the "basic" example of a neural network.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic Speech Emotion Recognition of Younger School Age Children

Matveev¹,

Matveev²,

Frolova³

et al. 2022

Mathematics

View full text Add to dashboard Cite

This paper introduces the extended description of a database that contains emotional speech in the Russian language of younger school age (8–12-year-old) children and describes the results of validation of the database based on classical machine learning algorithms, such as Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). The validation is performed using standard procedures and scenarios of the validation similar to other well-known databases of children’s emotional acting speech. Performance evaluation of automatic multiclass recognition on four emotion classes “Neutral (Calm)—Joy—Sadness—Anger” shows the superiority of SVM performance and also MLP performance over the results of perceptual tests. Moreover, the results of automatic recognition on the test dataset which was used in the perceptual test are even better. These results prove that emotions in the database can be reliably recognized both by experts and automatically using classical machine learning algorithms such as SVM and MLP, which can be used as baselines for comparing emotion recognition systems based on more sophisticated modern machine learning methods and deep neural networks. The results also confirm that this database can be a valuable resource for researchers studying affective reactions in speech communication during child-computer interactions in the Russian language and can be used to develop various edutainment, health care, etc. applications.

show abstract

Section: Classifiersmentioning

confidence: 99%

Section: Classifiersmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Speech Emotion Recognition of Younger School Age Children

Matveev¹,

Matveev²,

Frolova³

et al. 2022

Mathematics

View full text Add to dashboard Cite

show abstract

“…Properly validated AI tools can reduce the possible subjectivity bias and “enrich” the scale when applied to the vocal test (even enabling daily evaluations), as different studies have already demonstrated [ 11 , 12 , 20 , 21 , 22 ]. However, the human voice can be potentially influenced by other issues ranging from environmental conditions to subject-specific characteristics [ 23 , 24 , 25 ], so that other forms of evidence are mandatory. In particular for PD, the effect of medication on speech production is still poorly addressed, with results ranging from no effects [ 26 ] to meaningful ones [ 27 ], while the differences can even depend on the specific phonemes investigated [ 23 , 28 , 29 ].…”

Section: Introductionmentioning

confidence: 99%

“…Other than being a reliable means to non-empirically quantify voice impairment in diseases that affect phonatory production, voice analysis is also a completely non-invasive, low-cost and pseudo-real-time solution for deploying telemedicine assessments. Voice-based AI solutions have been successfully experimentally investigated and employed in other medical fields such as dysphonia [ 31 , 32 , 33 ], COVID-19 and pulmonary diseases [ 20 , 22 , 34 , 35 ], and even emotion and stress recognition [ 24 , 36 ].…”

Section: Introductionmentioning

confidence: 99%

Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison

Costantini

Cesarini

Leo

et al. 2023

Sensors

Self Cite

View full text Add to dashboard Cite

Parkinson’s Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients.

show abstract

Speech Emotion Recognition from Social Media Voice Messages Recorded in the Wild

Gómez-Zaragozá

Marín‐Morales

Parra

et al. 2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

Speech is the most natural way for human communication, carrying the emotional state of the speaker that plays an important role in social interaction. Currently, many instant messaging apps offer the possibility of exchanging voice audios with other users. As a result, a great amount of voice data is generated every day, representing a new challenging approach for speech emotion recognition in real environments. In this study, we investigated emotion recognition from voice messages recorded in the wild using machine-learning algorithms. Unlike most research in this field, which use databases based on emotions evoked in lab environments, simulated by actors or subjectively selected from radio or TV talks, we created an ecological speech dataset with audios from real WhatsApp conversations of 30 Spanish speakers. Four external evaluators labelled each audio in terms of arousal and valence using the Self-Assessment Manikin (SAM) procedure. Pre-processing techniques were applied to the audios and different time and frequency domain features were extracted. Supervised machine learning classifiers were computed using feature reduction and hyper-parameter tuning in order to recognize the affective state of each voice message. The best recognition rate was obtained with Support Vector Machines, achieving 71.37% along the arousal dimension and 70.73% along the valence dimension. These results support the use of emotion recognition models on daily communication apps, helping to understand social human behavior and their interactions with devices in the real world.

show abstract

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Cited by 27 publications

References 64 publications

Automatic Speech Emotion Recognition of Younger School Age Children

Automatic Speech Emotion Recognition of Younger School Age Children

Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison

Speech Emotion Recognition from Social Media Voice Messages Recorded in the Wild

Contact Info

Product

Resources

About