Despite the widespread use of supervised learning methods for speech emotion recognition, they are severely restricted due to the lack of sufficient amount of labelled speech data for the training. Considering the wide availability of unlabelled speech data, therefore, this paper proposes semisupervised autoencoders to improve speech emotion recognition. The aim is to reap the benefit from the combination of the labelled data and unlabelled data. The proposed model extends a popular unsupervised autoencoder by carefully adjoining a supervised learning objective. We extensively evaluate the proposed model on the INTERSPEECH 2009 Emotion Challenge database and other four public databases in different scenarios. Experimental results demonstrate that the proposed model achieves state-of-the-art performance with a very small number of labelled data on the challenge task and other tasks, and significantly outperforms other alternative methods.
One of the serious obstacles to the applications of speech emotion recognition systems in real-life settings is the lack of generalization of the emotion classifiers. Many recognition systems often present a dramatic drop in performance when tested on speech data obtained from different speakers, acoustic environments, linguistic content, and domain conditions. In this letter, we propose a novel unsupervised domain adaptation model, called Universum autoencoders, to improve the performance of the systems evaluated in mismatched training and test conditions. To address the mismatch, our proposed model not only learns discriminative information from labeled data, but also learns to incorporate the prior knowledge from unlabeled data into the learning. Experimental results on the labeled Geneva Whispered Emotion Corpus database plus other three unlabeled databases demonstrate the effectiveness of the proposed method when compared to other domain adaptation methods.
Speech emotion recognition (SER) is a powerful tool for endowing computers with the capacity to process information about the affective states of users in human-machine interactions. Recent research has shown the effectiveness of graph embeddingbased subspace learning and extreme learning machine applied to SER, but there are still various drawbacks in these two techniques that limit their application. Regarding subspace learning, the change from linearity to nonlinearity is usually achieved through kernelization, whereas extreme learning machines only take label information into consideration at the output layer. In order to overcome these drawbacks, this paper leverages extreme learning machines for dimensionality reduction and proposes a novel framework to combine spectral regression-based subspace learning and extreme learning machines. The proposed framework contains three stages-data mapping, graph decomposition, and regression. At the data mapping stage, various mapping strategies provide different views of the samples. At the graph decomposition stage, specifically designed embedding graphs provide a possibility to better represent the structure of data through generating virtual
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.