A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Tamulevičius, Gintautas; Korvel, Gražina; Yayak, Anil Bora; Treigys, Povilas; Bernatavičienė, Jolita; Kostek, Bożena

doi:10.3390/electronics9101725

Cited by 27 publications

(15 citation statements)

References 47 publications

(55 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for cross-linguistic studies, Rajoo and Aun [ 37 ] proved the strong language-dependent nature of SER, which was further explored by Fu et al [ 38 ], who trained algorithms with combinations of three languages, obtaining accuracies which, preliminarily, outlined the possibility of a cross-language model for German and Chinese, while Italian was not recognized as successfully—possibly due to the unbalanced dataset. Li and Akagi [ 39 ] obtained interesting results, merging widely known existing datasets, whereas Tamulevičius et al [ 40 ] obtained high accuracies with a CNN-based approach. However, their dataset is highly unbalanced, and the emotions have been acted by non-professionals.…”

Section: Introductionmentioning

confidence: 99%

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Costantini

Parada-Cabaleiro

Casali

et al. 2022

Sensors

View full text Add to dashboard Cite

Machine Learning (ML) algorithms within a human–computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to explore the feasibility and characteristics of a cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naïve Bayes and MLP) are applied to acoustic features, obtained through a procedure based on Kononenko’s discretization and correlation-based feature selection. The system encompasses five emotions (disgust, fear, happiness, anger and sadness), using the Emofilm database, comprised of short clips of English movies and the respective Italian and Spanish dubbed versions, for a total of 1115 annotated utterances. The results see MLP as the most effective classifier, with accuracies higher than 90% for single-language approaches, while the cross-language classifier still yields accuracies higher than 80%. The results show cross-gender tasks to be more difficult than those involving two languages, suggesting greater differences between emotions expressed by male versus female subjects than between different languages. Four feature domains, namely, RASTA, F0, MFCC and spectral energy, are algorithmically assessed as the most effective, refining existing literature and approaches based on standard sets. To our knowledge, this is one of the first studies encompassing cross-gender and cross-linguistic assessments on SER.

show abstract

Section: Introductionmentioning

confidence: 99%

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

Costantini

Parada-Cabaleiro

Casali

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Various kinds of emotions based on cross‐linguistic speech were established by Tamulevicius et al 30 The information regarding different emotions was gathered from several languages, including Spanish, English, German, Polish, Lithuanian, and Serbian. The emotions to cross‐linguistic speech dataset had attained a size greater than 10,000 clips of emotions.…”

Section: Related Workmentioning

confidence: 99%

A learning framework of modified deep recurrent neural network for classification and recognition of voice mood

Agarwal

Gupta

2022

Adaptive Control & Signal

View full text Add to dashboard Cite

Recognition of human emotions is a basic requirement in many real-time applications. Detection of exact emotions through voice provides relevant information for various purposes. Several computational methods have been employedfor the clear analysis of human emotions. Most of the previous approaches face complexities due to certain drawbacks like degraded signal quality, a requirement of high storage space, increased computational complexity, and deteriorated outcomes of classification accuracy. The proposed work was implemented to gather the accurate classification result of embedded emotions and minimize the computational complexities of MDDTRNN (modified deep duck and traveler recurrent neural network). The proposed work includes four steps: preprocessing, feature extraction, feature selection, and classification. In feature extraction, the spectral and frequency features are extracted using the adopting boosted MFCC (Mel frequency cepstral coefficients) method to improve training speed. In feature selection, the best features are selected using an algorithm of AAVOA (adaptive African vulture optimization algorithm). To provide optimal emotion results, the classification step is undertaken by the MDDTRNN technique. The proposed work shows better classification outcomes of emotions when compared to the existing approaches by holding the accuracy of (95.86%), precision as (93.79%), specificity as (94.28%), sensitivity as (92.89%) and the error rate is attained to be 5.266 in terms of IEMOCAP dataset. The accuracy result (96.27%), precision (94.83%), specificity (93.16%), sensitivity (94%) and the error rate is achieved to be 4.982 in terms of the EMODB dataset.

show abstract

“…This speaker-independent approach cannot be employed if there are several speakers. The researchers (Tamulevičius et al, 2020) have performed emotion recognition from the speech data on the various emotional speech-based databases belonging to multiple languages. As per the analysis done by the researchers (Khalil et al, 2019), it is easy and efficient if we extract the emotions from the speech to determine the sentiment.…”

Section: Positivementioning

confidence: 99%

Multimodal sentimental analysis for social media applications: A comprehensive review

Chandrasekaran

Nguyen

Hemanth

2021

WIREs Data Min & Knowl

View full text Add to dashboard Cite

The analysis of sentiments is essential in identifying and classifying opinions regarding a source material that is, a product or service. The analysis of these sentiments finds a variety of applications like product reviews, opinion polls, movie reviews on YouTube, news video analysis, and health care applications including stress and depression analysis. The traditional approach of sentiment analysis which is based on text involves the collection of large textual data and different algorithms to extract the sentiment information from it. But multimodal sentimental analysis provides methods to carry out opinion analysis based on the combination of video, audio, and text which goes a way beyond the conventional text-based sentimental analysis in understanding human behaviors. The remarkable increase in the use of social media provides a large collection of multimodal data that reflects the user's sentiment on certain aspects. This multimodal sentimental analysis approach helps in classifying the polarity (positive, negative, and neutral) of the individual sentiments. Our work aims to present a survey of recent developments in analyzing the multimodal sentiments (involving text, audio, and video/image) which involve humanmachine interaction and challenges involved in analyzing them. A detailed survey on sentimental dataset, feature extraction algorithms, data fusion methods, and efficiency of different classification techniques are presented in this work.

show abstract

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Cited by 27 publications

References 47 publications

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning

A learning framework of modified deep recurrent neural network for classification and recognition of voice mood

Multimodal sentimental analysis for social media applications: A comprehensive review

Contact Info

Product

Resources

About