A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model

Wei, Pengcheng; Zhao, Yu

doi:10.1007/s00779-019-01246-9

Cited by 22 publications

(21 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The proposed method is compared with the methods in [22], [26], [28], [30], and [31]. Experiment results are shown in Table 3, where the evaluation standard is the average of emotion recognition rate of discrete emotional states.…”

Section: Comparative Results Of Multi-modal Fusion Emotion Model Experimentsmentioning

confidence: 99%

“…Grid search is also used to adjust the hyperparameters of each tested machine learning model through the Spark cluster to shorten the execution time. [31] proposed a speech emotion recognition algorithm based on the superposed sparse depth model. The improvement of this algorithm is based on the automatic encoder, denoising automatic encoder and sparse automatic encoder.…”

Section: (3)multimodal Emotion Recognition Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Expression-EEG Based Collaborative Multimodal Emotion Recognition Using Deep AutoEncoder

Zhang

2020

IEEE Access

View full text Add to dashboard Cite

Emotion recognition has shown many valuable roles in people's lives under the background of artificial intelligence technology. However, most existing emotion recognition methods have poor recognition performance, which prevents their promotion in practical applications. To alleviate this problem, we proposed an expression-EEG interaction multi-modal emotion recognition method using a deep automatic encoder. Firstly, decision tree is applied as objective feature selection method. Then, based on the facial expression features recognized by sparse representation, the solution vector coefficients are analyzed to determine the facial expression category of the test samples. After that, the bimodal deep automatic encoder is adopted to fuse the EEG signals and facial expression signals. The third layer of BDAE extracts features for training of supervised learning. Finally, LIBSVM classifier is used to complete classification task. We carried out experiments on a constructed video library to verify the proposed emotion recognition method. The results show that the proposed method can effectively extract and integrate high-level emotion-related features in EEG and facial expression signals. The recognition rate of discrete emotion state type and the average emotion recognition rate have been improved relatively, in which the average emotion recognition rate is 85.71%. Overall, the emotion recognition ability has been greatly improved.

show abstract

Section: Comparative Results Of Multi-modal Fusion Emotion Model Experimentsmentioning

confidence: 99%

Section: (3)multimodal Emotion Recognition Methodsmentioning

confidence: 99%

Expression-EEG Based Collaborative Multimodal Emotion Recognition Using Deep AutoEncoder

Zhang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…MTC-AE contains multiple local DNNs based on different low-level descriptors with different statistical functions that are partly concatenated together, by which the structure is enabled to consider both local and global features simultaneously. Pengcheng Wei et al [30] proposed an algorithm based on an autoencoder, denoising autoencoder, and sparse autoencoder. The first layer of the structure uses a denoising autoencoder to learn a hidden feature with a larger dimension than the dimension of the input features, and the second layer employs a sparse autoencoder to learn sparse features.…”

Section: Related Workmentioning

confidence: 99%

Autoencoder With Emotion Embedding for Speech Emotion Recognition

Zhang

Xue

2021

IEEE Access

View full text Add to dashboard Cite

An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effectively extract emotion-oriented features. In this paper, we propose a novel algorithm, an autoencoder with emotion embedding, to extract deep emotion features. Unlike many previous works, instance normalization, which is a common technique in the style transfer field, is introduced into our model rather than batch normalization. Furthermore, the emotion embedding path in our method can lead the autoencoder to efficiently learn a priori knowledge from the label. It can enable the model to distinguish which features are most related to human emotion. We concatenate the latent representation learned by the autoencoder and acoustic features obtained by the openSMILE toolkit. Finally, the concatenated feature vector is utilized for emotion classification. To improve the generalization of our method, a simple data augmentation approach is applied. Two publicly available and highly popular databases, IEMOCAP and EMODB, are chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems.

show abstract

“…A speech emotion recognition algorithm based on an improved stack kernel sparse depth model is proposed in Reference [13]. The algorithm is improved based on an automatic encoder, denoising automatic encoder, and sparse automatic encoder.…”

Section: Related Workmentioning

confidence: 99%

Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism

Zhao

Chen

2021

Computational and Mathematical Methods in Medicine

View full text Add to dashboard Cite

Due to the complexity of human emotions, there are some similarities between different emotion features. The existing emotion recognition method has the problems of difficulty of character extraction and low accuracy, so the bidirectional LSTM and attention mechanism based on the expression EEG multimodal emotion recognition method are proposed. Firstly, facial expression features are extracted based on the bilinear convolution network (BCN), and EEG signals are transformed into three groups of frequency band image sequences, and BCN is used to fuse the image features to obtain the multimodal emotion features of expression EEG. Then, through the LSTM with the attention mechanism, important data is extracted in the process of timing modeling, which effectively avoids the randomness or blindness of sampling methods. Finally, a feature fusion network with a three-layer bidirectional LSTM structure is designed to fuse the expression and EEG features, which is helpful to improve the accuracy of emotion recognition. On the MAHNOB-HCI and DEAP datasets, the proposed method is tested based on the MATLAB simulation platform. Experimental results show that the attention mechanism can enhance the visual effect of the image, and compared with other methods, the proposed method can extract emotion features from expressions and EEG signals more effectively, and the accuracy of emotion recognition is higher.

show abstract

A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model

Cited by 22 publications

References 29 publications

Expression-EEG Based Collaborative Multimodal Emotion Recognition Using Deep AutoEncoder

Expression-EEG Based Collaborative Multimodal Emotion Recognition Using Deep AutoEncoder

Autoencoder With Emotion Embedding for Speech Emotion Recognition

Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism

Contact Info

Product

Resources

About