Speech is the most effective way for people to exchange complex information. Recognition of emotional information contained in speech is one of the important challenges in the field of artificial intelligence. To better acquire emotional features in speech signals, a parallelized convolutional recurrent neural network (PCRN) with spectral features is proposed for speech emotion recognition. First, frame-level features are extracted from each utterance and, a long short-term memory is employed to learn these features frame by frame. At the same time, the deltas and delta-deltas of the log Mel-spectrogram are calculated and reconstructed into three channels (static, delta, and delta-delta); these 3-D features are learned by a convolutional neural network (CNN). Then, the two learned high-level features are fused and batch normalized. Finally, a SoftMax classifier is used to classify emotions. Our PCRN model simultaneously processes two different types of features in parallel to better learn the subtle changes in emotion. The experimental results on four public datasets show the superiority of our proposed method, which is better than the previous works.INDEX TERMS Speech emotion recognition, parallelized convolutional recurrent neural network, convolutional neural network, long short-term memory.
Existing algorithms of speech-based deception detection are severely restricted by the lack of sufficient number of labelled data. However, a large amount of easily available unlabelled data has not been utilized in reality. To solve this problem, this paper proposes a semi-supervised additive noise autoencoder model for deception detection. This model updates and optimizes the semi-supervised autoencoder and it consists of two layers of encoder and decoder, and a classifier. Firstly, it changes the activation function of the hidden layer in network according to the characteristics of the deception speech. Secondly, in order to prevent over-fitting during training, the specific ratio dropout is added at each layer cautiously. Finally, we directly connected the supervised classification task in the output of encoder to make the network more concise and efficient. Using the feature set specified by the INTERSPEECH 2009 Emotion Challenge, the experimental results on Columbia-SRI-Colorado (CSC) corpus and our own deception corpus show that the proposed model can achieve more advanced performance than other alternative methods with only a small amount of labelled data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.