The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits -the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks.
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian Mixture Models (GMMs) with Universal Background Models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.
The m-FDA scale was introduced to assess the dysarthria level of patients with PD. Articulation features extracted from continuous speech signals to create i-vectors were the most accurate to quantify the dysarthria level, with correlations of up to 0.69 between the predicted m-FDA scores and those assigned by the phoniatricians. When the dysarthria levels were estimated considering dedicated speech exercises such as rapid repetition of syllables (DDKs) and read texts, the correlations were 0.64 and 0.57, respectively. In addition, the combination of several feature sets and speech tasks improved the results, which validates the hypothesis about the contribution of information from different tasks and feature sets when assessing dysarthric speech signals. The speaker models seem to be promising to perform individual modeling for monitoring the dysarthria level of PD patients. The proposed approach may help clinicians to make more accurate and timely decisions about the evaluation and therapy associated to the dysarthria level of patients. The proposed approach is a great step towards unobtrusive/ecological evaluations of patients with dysarthric speech without the need of attending medical appointments.
The INTERSPEECH 2012 Speaker Trait Challenge provides for the first time a unified test-bed for 'perceived' speaker traits: Personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In this paper, we describe these three Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.