DEMoS: an Italian emotional speech corpus

Parada-Cabaleiro, Emilia; Costantini, Giovanni; Batliner, Anton; Schmitt, Maximilian; Schuller, Björn

doi:10.1007/s10579-019-09450-y

Cited by 41 publications

(21 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to this theory, surprised and excited correspond to positive emotions; while anger, sad, hate, and fear correspond to negative emotions [18,19]. Positive emotions promote the occurrence of cognition activities, while negative emotions hinder the cognition process [20,21].…”

Section: Facial Expression Recognitionmentioning

confidence: 99%

Machine Learning-Based Student Emotion Recognition for Business English Class

Cui

Wang

Zhao

2021

Int. J. Emerg. Technol. Learn.

View full text Add to dashboard Cite

Traditional English teaching model neglects student emotions, making many tired of learning. Machine learning supports end-to-end recognition of learning emotions, such that the recognition system can adaptively adjust the learning difficulty in English classroom. With the help of machine learning, this paper presents a method to extract the facial expression features of students in business English class, and establishes a student emotion recognition model, which consists of such modules as emotion mechanism, signal acquisition, analysis and recognition, emotion understanding, emotion expression, and wearable equipment. The results show that the proposed emotion recognition model monitors the real-time emotional states of each student during English learning; upon detecting frustration or boredom, machine learning will timely switch to the contents that interest the student or easier to learn, keeping the student active in learning. The research provides an end-to-end student emotion recognition system to assist with classroom teaching, and enhance the positive emotions of students in English learning.

show abstract

Section: Facial Expression Recognitionmentioning

confidence: 99%

Machine Learning-Based Student Emotion Recognition for Business English Class

Cui

Wang

Zhao

2021

Int. J. Emerg. Technol. Learn.

View full text Add to dashboard Cite

show abstract

“…The need for monolingual spoken data is growing steadily to achieve linguistic coverage in automatic speech recognition and text-to-speech research and development. Some examples to these are: the Switchboard corpus (Godfrey & Holliman, 1993), for English telephone conversational speech, the CALLHOME speech corpora, consisting of telephone conversations in several languages (Canavan, Graff, & Zipperlen, 1997), English Boston University Radio Speech Corpus (Ostendorf, Price, & Shattuck-Hufnagel, 1996), Rhapsodie (Lacheret et al, 2014), a French speech corpus with prosodic, syntactic and orthographic annotations, DEMoS (Parada-Cabaleiro et al, 2019) an Italian emotional speech corpus, RSC 3 https://rosettaproject.org/projects/300-languages/. (Georgescu et al, 2020), a Romanian read speech corpus for automatic speech recognition, TV3Parla (Külebi & Ö ktem, 2018) and ParlamentParla (Külebi et al, 2020), parliamentary and television speech corpora for Catalan.…”

Section: Related Workmentioning

confidence: 99%

Corpora compilation for prosody-informed speech processing

Öktem

Farrús

Bonafonte

2021

Lang Resources & Evaluation

View full text Add to dashboard Cite

Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community.

show abstract

“…For this study, we utilise the Database of Elicited Mood in Speech (DEMoS) [29], which is an Italian emotional speech corpus. DEMoS was collected from 68 speakers (23 females, 45 males) with 9 365 emotional and 332 neutral speech samples in total.…”

Section: Databasementioning

confidence: 99%

“…The 9 365 speech samples are annotated with seven classes of emotion shown in Table 1, of which all are used in our experiments. The emotions of DEMoS were induced by an arousal-valence progression [29]. To avoid speaker dependency during training, partitioning of the data (train, development, and test) was made speaker-independently with consideration to gender and emotional class balancing.…”

Section: Databasementioning

confidence: 99%

Generating and Protecting Against Adversarial Attacks for Deep Speech-Based Emotion Recognition Models

Ren¹,

Baird²,

Schuller³

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The development of deep learning models for speech emotion recognition has become a popular area of research. Adversarially generated data can cause false predictions, and in an endeavor to ensure model robustness, defense methods against such attacks should be addressed. With this in mind, in this study, we aim to train deep models to defending against non-targeted white-box adversarial attacks. Adversarial data is first generated from the real data using the fast gradient sign method. Then in the research field of speech emotion recognition, adversarial-based training is employed as a method for protecting against adversarial attack. We then train deep convolutional models with both real and adversarial data, and compare the performances of two adversarial training procedures -namely, vanilla adversarial training, and similarity-based adversarial training. In our experiments, through the use of adversarial data augmentation, both of the considered adversarial training procedures can improve the performance when validated on the real data. Additionally, the similarity-based adversarial training learns a more robust model when working with adversarial data. Finally, the considered VGG-16 model performs the best across all models, for both real and generated data.

show abstract

DEMoS: an Italian emotional speech corpus

Cited by 41 publications

References 65 publications

Machine Learning-Based Student Emotion Recognition for Business English Class

Machine Learning-Based Student Emotion Recognition for Business English Class

Corpora compilation for prosody-informed speech processing

Generating and Protecting Against Adversarial Attacks for Deep Speech-Based Emotion Recognition Models

Contact Info

Product

Resources

About