2012
DOI: 10.1145/2133366.2133372
|View full text |Cite
|
Sign up to set email alerts
|

A multitask approach to continuous five-dimensional affect sensing in natural speech

Abstract: Automatic affect recognition is important for the ability of future technical system to interact with us socially in an intelligent way by understanding our current affective state. In recent years there has been a shift in the field of affect recognition from "in the lab" experiments with acted data to "in the wild" experiments with spontaneous and naturalistic data. Two major issues thereby are the proper segmentation of the input and adequate description and modelling of affective states. The first issue is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(24 citation statements)
references
References 47 publications
(45 reference statements)
1
21
0
Order By: Relevance
“…Results show that arousal is significantly better recognised from the acoustic features than valence. This result is in agreement with the literature, where acoustic features have always been shown to present a stronger correlation with the arousal dimension in comparison with valence [21], [25], [26], [30], [37]. The values of CCC and CC are most of the time almost identical, as the RMSE is quite low; we obtained an average RMSE of 0.068 for arousal and of 0.128 for valence over a range of 2.…”
Section: Training and Optimization Of Ssrmsupporting
confidence: 92%
See 1 more Smart Citation
“…Results show that arousal is significantly better recognised from the acoustic features than valence. This result is in agreement with the literature, where acoustic features have always been shown to present a stronger correlation with the arousal dimension in comparison with valence [21], [25], [26], [30], [37]. The values of CCC and CC are most of the time almost identical, as the RMSE is quite low; we obtained an average RMSE of 0.068 for arousal and of 0.128 for valence over a range of 2.…”
Section: Training and Optimization Of Ssrmsupporting
confidence: 92%
“…However, the natural diversity found in emotion perception is usually merged when a machine learning model is trained, by averaging several evaluations from a pool of raters into a single gold standard. Whereas the use of all annotation data can help at preserving diversity in emotion perception, e. g., by using multi-task learning of each annotator [25], [26], it has the main disadvantage to increase the overall complexity of the model according to the number of available raters. The issue of synchronisation of various individual ratings for defining a gold standard has also been investigated with signal processing techniques.…”
Section: Related Workmentioning
confidence: 99%
“…For speech emotion prediction, MTL has been frequently utilised. Eyben et al [17] firstly proposed to jointly train five different emotional dimensions for continuous emotion recognition. The experimental results have clearly indicated that the MTL model remarkably outperforms single-task-based models.…”
Section: Related Workmentioning
confidence: 99%
“…It was reported that while ground truth hard labels performed better than soft labels, soft labels had a more similar entropy to human annotators. In [15], the inter-annotator standard deviation was used to model the variability between multiple annotators in a multi-task learning emotion recognition framework.…”
Section: Relation To Prior Workmentioning
confidence: 99%