Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-0032
|View full text |Cite
|
Sign up to set email alerts
|

The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks

Abstract: The INTERSPEECH 2020 Computational Paralinguistics Challenge addresses three different problems for the first time in a research competition under well-defined conditions: In the Elderly Emotion Sub-Challenge, arousal and valence in the speech of elderly individuals have to be modelled as a 3-class problem; in the Breathing Sub-Challenge, breathing has to be assessed as a regression problem; and in the Mask Sub-Challenge, speech without and with a surgical mask has to be told apart. We describe the Sub-Challen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
130
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 86 publications
(133 citation statements)
references
References 26 publications
(10 reference statements)
3
130
0
Order By: Relevance
“…[±] A surge in machine learning research, has come from international challenges (Schuller et al, 2013 ; Ringeval et al, 2019 )—driving improvements in accuracy across multiple machine learning domains (Meer et al, 2000 ). However, this fast-paced environment often leaves less time for interpreting how particular features may have explicitly impacted a result, or for an explanation of a models decision-making process.…”
Section: Methodology: Ethical Data Considerationsmentioning
confidence: 99%
“…[±] A surge in machine learning research, has come from international challenges (Schuller et al, 2013 ; Ringeval et al, 2019 )—driving improvements in accuracy across multiple machine learning domains (Meer et al, 2000 ). However, this fast-paced environment often leaves less time for interpreting how particular features may have explicitly impacted a result, or for an explanation of a models decision-making process.…”
Section: Methodology: Ethical Data Considerationsmentioning
confidence: 99%
“…In the ML experiments, we employ the ComParE feature set (Schuller et al 2013), comprising 6373 acoustic features (Eyben et al 2015) computed by applying statistical functions to 65 Low-Level Descriptors (LLDs), extracted by the OPENSMILE feature extractor (Eyben et al 2010), and a Support Vector Machine (SVM) classifier with a linear kernel from the open-source toolkit LIBLINEAR (Fan et al 2008). Even though Deep Neural Networks (DNNs) are prevalent nowadays for ML tasks, in affective computing research their performance is not yet superior to rather classic ML procedures such as SVMs.…”
Section: Methodsmentioning
confidence: 99%
“…Even though Deep Neural Networks (DNNs) are prevalent nowadays for ML tasks, in affective computing research their performance is not yet superior to rather classic ML procedures such as SVMs. This can be seen in the series of Interspeech Challenges, from Schuller et al (2013) to Schuller et al (2018) and might simply be due to the sparse data problem: DNNs need very large databases; such databases do not exist for emotion modelling. Therefore, we chose an SVM classifier as it has only few hyperparameters, compared to recent deep learning approaches, and thus gives more reliable results in terms of robustness during training; our approach is more focused on understanding and less on optimising classification.…”
Section: Methodsmentioning
confidence: 99%
“…Extracting both hand-crafted acoustic features and deep representations of the audio signal on the frame-level of all sessions. We decided to extract both acoustic and D eep S pectrum features, due to their previous performance and proven ability in capturing characteristics of speech (Schuller et al, 2013 ; Amiriparian et al, 2016 , 2018 ; Eyben, 2016 ). Both feature sets are different in their nature; C om P ar E is a hand-crafted, expert-designed feature set which can cover time-dependent frame-level information for the input signals, and D eep S pectrum is based on the spectrograms of audio signals, focusing mostly on the time-frequency properties of the speech.…”
Section: Dataset and Featuresmentioning
confidence: 99%