The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

Schuller, Björn W.; Steidl, Stefan; Batliner, Anton; Vinciarelli, Alessandro; Scherer, Klaus R.; Ringeval, Fabien; Chétouani, Mohamed; Weninger, Felix; Eyben, Florian; Marchi, Erik; Mortillaro, Marcello; Salamin, Hugues; Polychroniou, Anna; Valente, Fabio; Kim, Samuel

doi:10.21437/interspeech.2013-56

Cited by 452 publications

(138 citation statements)

References 26 publications

Supporting

Mentioning

127

Contrasting

Order By: Relevance

“…Other features include a harmonics to noise ratio, which was found unrelated to arousal [ 44 ], and jitter, which showed a positive correlation with depression [ 45 ]. Arousal has been easiest to detect based on voice acoustics [ 46 ]. Discrete emotion recognition based on these features in deep neural networks has also been successful [ 47 ].…”

Section: Introductionmentioning

confidence: 99%

In Search of State and Trait Emotion Markers in Mobile-Sensed Language: Field Study

Carlier¹,

Niemeijer²,

Mestdagh³

et al. 2022

JMIR Ment Health

View full text Add to dashboard Cite

Background Emotions and mood are important for overall well-being. Therefore, the search for continuous, effortless emotion prediction methods is an important field of study. Mobile sensing provides a promising tool and can capture one of the most telling signs of emotion: language. Objective The aim of this study is to examine the separate and combined predictive value of mobile-sensed language data sources for detecting both momentary emotional experience as well as global individual differences in emotional traits and depression. Methods In a 2-week experience sampling method study, we collected self-reported emotion ratings and voice recordings 10 times a day, continuous keyboard activity, and trait depression severity. We correlated state and trait emotions and depression and language, distinguishing between speech content (spoken words), speech form (voice acoustics), writing content (written words), and writing form (typing dynamics). We also investigated how well these features predicted state and trait emotions using cross-validation to select features and a hold-out set for validation. Results Overall, the reported emotions and mobile-sensed language demonstrated weak correlations. The most significant correlations were found between speech content and state emotions and between speech form and state emotions, ranging up to 0.25. Speech content provided the best predictions for state emotions. None of the trait emotion–language correlations remained significant after correction. Among the emotions studied, valence and happiness displayed the most significant correlations and the highest predictive performance. Conclusions Although using mobile-sensed language as an emotion marker shows some promise, correlations and predictive R2 values are low.

show abstract

Section: Introductionmentioning

confidence: 99%

In Search of State and Trait Emotion Markers in Mobile-Sensed Language: Field Study

Carlier¹,

Niemeijer²,

Mestdagh³

et al. 2022

JMIR Ment Health

View full text Add to dashboard Cite

show abstract

“…The COMPARE acoustic feature set is a well established set which has shown to give consistent insights for related domains of speech analysis (Stappen et al, 2019), including states of stress (Baird et al, 2019;Stappen et al, 2021), and anxiety (Baird et al, 2020). The COMPARE feature set is also used as the baseline feature for the INTERSPEECH COMPARE challenges since 2013 (Schuller et al, 2013), and further extended in 2016 (Schuller et al, 2016). As with the 2021 COMPARE challenge (Schuller et al, 2021), we extract the features from the entire audio samples, resulting in feature sets of 6,373 static features, which are derived from the calculation of staticfunctionals obtained from low-level descriptor (LLD) contours (Eyben et al, 2013;Schuller et al, 2013).…”

Section: Featuresmentioning

confidence: 99%

“…The COMPARE feature set is also used as the baseline feature for the INTERSPEECH COMPARE challenges since 2013 (Schuller et al, 2013), and further extended in 2016 (Schuller et al, 2016). As with the 2021 COMPARE challenge (Schuller et al, 2021), we extract the features from the entire audio samples, resulting in feature sets of 6,373 static features, which are derived from the calculation of staticfunctionals obtained from low-level descriptor (LLD) contours (Eyben et al, 2013;Schuller et al, 2013).…”

Section: Featuresmentioning

confidence: 99%

A Cross-Corpus Speech-Based Analysis of Escalating Negative Interactions

Lefter

Baird

Stappen

et al. 2022

Front. Comput. Sci.

View full text Add to dashboard Cite

The monitoring of an escalating negative interaction has several benefits, particularly in security, (mental) health, and group management. The speech signal is particularly suited to this, as aspects of escalation, including emotional arousal, are proven to easily be captured by the audio signal. A challenge of applying trained systems in real-life applications is their strong dependence on the training material and limited generalization abilities. For this reason, in this contribution, we perform an extensive analysis of three corpora in the Dutch language. All three corpora are high in escalation behavior content and are annotated on alternative dimensions related to escalation. A process of label mapping resulted in two possible ground truth estimations for the three datasets as low, medium, and high escalation levels. To observe class behavior and inter-corpus differences more closely, we perform acoustic analysis of the audio samples, finding that derived labels perform similarly across each corpus, with escalation interaction increasing in pitch (F0) and intensity (dB). We explore the suitability of different speech features, data augmentation, merging corpora for training, and testing on actor and non-actor speech through our experiments. We find that the extent to which merging corpora is successful depends greatly on the similarities between label definitions before label mapping. Finally, we see that the escalation recognition task can be performed in a cross-corpus setup with hand-crafted speech features, obtaining up to 63.8% unweighted average recall (UAR) at best for a cross-corpus analysis, an increase from the inter-corpus results of 59.4% UAR.

show abstract

“…Acoustic features of the Emotion data set are extracted using OpenSmile 1 with the computational paralinguistic challenge's (COMPARE-2013) feature set (Schuller et al, 2013). Sentence embedding features are extracted with a Chinese RoBERTa pretrained model 2 .…”

Section: Real-world Data Setsmentioning

confidence: 99%

Temporal-aware Language Representation Learning From Crowdsourced Labels

Yang¹,

Zhai²,

Ding³

et al. 2021

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

View full text Add to dashboard Cite

Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intraand inter-observer variability. Since the highcapacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose TACMA, a temporal-aware language representation learning heuristic for crowdsourced labels with multiple annotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at https://github.com/ CrowdsourcingMining/TACMA.

show abstract

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

Cited by 452 publications

References 26 publications

In Search of State and Trait Emotion Markers in Mobile-Sensed Language: Field Study

In Search of State and Trait Emotion Markers in Mobile-Sensed Language: Field Study

A Cross-Corpus Speech-Based Analysis of Escalating Negative Interactions

Temporal-aware Language Representation Learning From Crowdsourced Labels

Contact Info

Product

Resources

About