Speech Emotion Recognition Adapted to Multimodal Semantic Repositories

Vryzas, Nikolaos; Vrysis, Lazaros; Kotsakis, Rigas; Dimoulas, Charalampos

doi:10.1109/smap.2018.8501881

Cited by 17 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Finally, a question arises whether a ground truth database may be formulated containing emotionally "loaded" utterances, utilizing such techniques as, e.g., crowdsourcing [57] applied for both "producing" emotions in speech as well as evaluating gathered utterances. Such an experiment may result in more reliable datasets for the in-depth training process.…”

Section: Discussionmentioning

confidence: 99%

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

et al. 2020

View full text Add to dashboard Cite

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation only. The assumption is that the speech audio signal carries sufficient emotional information to detect and retrieve it. Several two-dimensional acoustic feature spaces, such as cochleagrams, spectrograms, mel-cepstrograms, and fractal dimension-based space, are employed as the representations of speech emotional features. A convolutional neural network (CNN) is used as a classifier. The results show the superiority of cochleagrams over other feature spaces utilized. In the CNN-based speaker-independent cross-linguistic speech emotion recognition (SER) experiment, the accuracy of over 90% is achieved, which is close to the monolingual case of SER.

show abstract

Section: Discussionmentioning

confidence: 99%

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The current methods of analyzing audio data flow are not perfect either. Speech recognition technologies are at a high level, and the current results help us analyze the semantic part of speech [13]. However, the intonation components have not yet been covered properly.…”

Section: Methodsmentioning

confidence: 99%

Intelligent Assistance in Online Interviewing. Emotional Routing Method

Sorokina¹,

Soma²

2020

RR. I.T.

View full text Add to dashboard Cite

“…In the SER field, there are three important aspects being studied and discussed in the literature: the choice of suitable acoustic features [9], the design of an appropriate classifier [10] and the generation of an emotional speech database [11][12][13]. Some works propose multimodal approaches combining visual and speech data to improve and strengthen emotion recognition systems [14,15]. It is also well attested that speech recognition systems function less efficiently when the speaker is in an emotional state [16].…”

Section: Technological Challengesmentioning

confidence: 99%

Data Augmentation for Speaker Identification under Stress Conditions to Combat Gender-Based Violence

et al. 2019

View full text Add to dashboard Cite

A Speaker Identification system for a personalized wearable device to combat gender-based violence is presented in this paper. Speaker recognition systems exhibit a decrease in performance when the user is under emotional or stress conditions, thus the objective of this paper is to measure the effects of stress in speech to ultimately try to mitigate their consequences on a speaker identification task, by using data augmentation techniques specifically tailored for this purpose given the lack of data resources for this condition. An extensive experimentation has been carried out for assessing the effectiveness of the proposed techniques. First, we conclude that the best performance is always obtained when naturally stressed samples are included in the training set, and second, when these are not available, their substitution and augmentation with synthetically generated stress-like samples improves the performance of the system.

show abstract

Speech Emotion Recognition Adapted to Multimodal Semantic Repositories

Cited by 17 publications

References 11 publications

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Intelligent Assistance in Online Interviewing. Emotional Routing Method

Data Augmentation for Speaker Identification under Stress Conditions to Combat Gender-Based Violence

Contact Info

Product

Resources

About