Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech

Yan, Jun; Zheng, Wenming; Xu, Qingsong; Lu, Guanming; Li, Haibo; Wang, Bei

doi:10.1109/tmm.2016.2557721

Cited by 90 publications

(43 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is clear from these results that a SVM learns better from higher-dimensional feature sets such as the ComParE and the STC sets, which is also a consistent phenomenon observed in [5]. Yan et al [29] recently published a baseline result on the eNTERFACE'05 corpus using the PC feature set. They trained a SVM classifier on the PC feature set with a speaker-dependent five-fold cross validation evaluation strategy as one of their baseline models.…”

Section: Resultssupporting

confidence: 63%

“…For the ease to compare models, Eyben et al [5] summarized the performances by a SVM trained on the INTERSPEECH challenge feature sets over several public corpora. Yan et al [29] recently proposed a sparse kernel reduced-rank regression (SKRRR) for bimodal emotion recognition from facial expressions and speech, which has achieved one of the state-of-the-art performances on the eNTERFACE'05 [30] corpus.…”

Section: Related Workmentioning

confidence: 99%

“…The performances (UA (%)) of the optimal SVM model, the LDNN-based models and the CLDNN-based models. The sparse kernel reduced rank regression (SKRRR)[29] is one of the state-of-the-art models on the eNTER-FACE'05 corpus.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

Huang

Narayanan

2017

2017 IEEE International Conference on Multimedia and Expo (ICME)

117

View full text Add to dashboard Cite

Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, for learning from speech. However, studies have suggested to favor a certain type of convolutional operations when building a deep convolutional neural network for speech applications although there has been promising results using different types of convolutional operations. In this work, we study four types of convolutional operations on different input features for speech emotion recognition under noisy and clean conditions in order to derive a comprehensive understanding. Since affective behavioral information has been shown to reflect temporally varying of mental state and convolutional operation are applied locally in time, all deep neural networks share a deep recurrent sub-network architecture for further temporal modeling. We present detailed quantitative module-wise performance analysis to gain insights into information flows within the proposed architectures. In particular, we demonstrate the interplay of affective information and the other irrelevant information during the progression from one module to another. Finally we show that all of our deep neural networks provide state-of-the-art performance on the eNTERFACE'05 corpus.

show abstract

Section: Resultssupporting

confidence: 63%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

Huang

Narayanan

2017

2017 IEEE International Conference on Multimedia and Expo (ICME)

117

View full text Add to dashboard Cite

show abstract

“…where M ∈ R p×N denotes the facial image matrix, ψ(M) ∈ R p φ ×N denotes the mapped facial image matrix, E ∈ R p×c is the projection matrix of the LPP method, B ∈ R p ϕ ×c is the projection matrix of ψ(M), p and p ϕ denote the dimension of the facial image matrix and the mapped facial image matrix respectively, N and c denote the number of the facial image and projection vectors respectively [1], [14],…”

Section: Regression-based Robust Locality Preserving Projectionsmentioning

confidence: 99%

“…where M L and M S is denoted as the low-rank term and the sparse term of the facial image matrix M respectively [3], [4], [8]- [10], α MS is the sparse parameter of the sparse term M S [1], [14]. The same to the RPCA [9], [10] and the RR [3], [4] method, the formula of (3) also can be written as the regression modal of (4) arg min…”

Section: Regression-based Robust Locality Preserving Projectionsmentioning

confidence: 99%

Facial Expression Recognition via Regression-Based Robust Locality Preserving Projections

Yan

Liang

et al. 2018

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

In this paper, we present a novel regression-based robust locality preserving projections (RRLPP) method to effectively deal with the issue of noise and occlusion in facial expression recognition. Similar to robust principal component analysis (RPCA) and robust regression (RR) approach, the basic idea of the presented RRLPP approach is also to lead in the low-rank term and the sparse term of facial expression image sample matrix to simultaneously overcome the shortcoming of the locality preserving projections (LPP) method and enhance the robustness of facial expression recognition. However, RRLPP is a nonlinear robust subspace method which can effectively describe the local structure of facial expression images. The test results on the Multi-PIE facial expression database indicate that the RRLPP method can effectively eliminate the noise and the occlusion problem of facial expression images, and it also can achieve better or comparative facial expression recognition rate compared to the non-robust and robust subspace methods meantime. key words: facial expression recognition, regression-based robust locality preserving projections (RRLPP), augmented Lagrangian multiplier

show abstract

Analysis of Emotion in Speech Signal Processing and Rejection of Noise Using HMM

Balasubramanian¹

2021

AI and IoT‐Based Intelligent Automation in Robotics

View full text Add to dashboard Cite

Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech

Cited by 90 publications

References 50 publications

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

Facial Expression Recognition via Regression-Based Robust Locality Preserving Projections

Analysis of Emotion in Speech Signal Processing and Rejection of Noise Using HMM

Contact Info

Product

Resources

About