Abstract:In this paper, we study the problem of facial expression recognition using a novel space-time geometric representation. We describe the temporal evolution of facial landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the advantage to bring naturally a second desirable quantity when comparing shapes -the spatial covariance -in addition to the conventional affineshape representation. We derive then geometric and computational t… Show more
“…From Fig. 4 and the confusion matrix in Table 2, we can observe that the two expressions: happiness and surprise are well recognized in the two datasets while the main confusions happened in the two expressions: fear and sadness, conforming to state-of-the-art results [35], [37]. Besides, we highlight the superiority of extrinsic SCDL compared to intrinsic SCDL.…”
Section: Macro-expression Recognitionsupporting
confidence: 68%
“…Similarly, on Oulu-CASIA, our best result is lower than DTAGN and higher than DTGN. On the other hand, the method of [37] achieved a better performance on both datasets compared to our method. Comparing the confusion matrices, the same method seems to better recognize the sadness expression while our method is clearly more efficient in recognizing the contempt expression.…”
The detection and tracking of human landmarks in video streams has gained in reliability partly due to the availability of affordable RGB-D sensors. The analysis of such time-varying geometric data is playing an important role in the automatic human behavior understanding. However, suitable shape representations as well as their temporal evolution, termed trajectories, often lie to nonlinear manifolds. This puts an additional constraint (i.e., nonlinearity) in using conventional Machine Learning techniques. As a solution, this paper accommodates the well-known Sparse Coding and Dictionary Learning approach to study time-varying shapes on the Kendall shape spaces of 2D and 3D landmarks. We illustrate effective coding of 3D skeletal sequences for action recognition and 2D facial landmark sequences for macro-and micro-expression recognition. To overcome the inherent nonlinearity of the shape spaces, intrinsic and extrinsic solutions were explored. As main results, shape trajectories give rise to more discriminative time-series with suitable computational properties, including sparsity and vector space structure. Extensive experiments conducted on commonly-used datasets demonstrate the competitiveness of the proposed approaches with respect to state-of-the-art.
“…From Fig. 4 and the confusion matrix in Table 2, we can observe that the two expressions: happiness and surprise are well recognized in the two datasets while the main confusions happened in the two expressions: fear and sadness, conforming to state-of-the-art results [35], [37]. Besides, we highlight the superiority of extrinsic SCDL compared to intrinsic SCDL.…”
Section: Macro-expression Recognitionsupporting
confidence: 68%
“…Similarly, on Oulu-CASIA, our best result is lower than DTAGN and higher than DTGN. On the other hand, the method of [37] achieved a better performance on both datasets compared to our method. Comparing the confusion matrices, the same method seems to better recognize the sadness expression while our method is clearly more efficient in recognizing the contempt expression.…”
The detection and tracking of human landmarks in video streams has gained in reliability partly due to the availability of affordable RGB-D sensors. The analysis of such time-varying geometric data is playing an important role in the automatic human behavior understanding. However, suitable shape representations as well as their temporal evolution, termed trajectories, often lie to nonlinear manifolds. This puts an additional constraint (i.e., nonlinearity) in using conventional Machine Learning techniques. As a solution, this paper accommodates the well-known Sparse Coding and Dictionary Learning approach to study time-varying shapes on the Kendall shape spaces of 2D and 3D landmarks. We illustrate effective coding of 3D skeletal sequences for action recognition and 2D facial landmark sequences for macro-and micro-expression recognition. To overcome the inherent nonlinearity of the shape spaces, intrinsic and extrinsic solutions were explored. As main results, shape trajectories give rise to more discriminative time-series with suitable computational properties, including sparsity and vector space structure. Extensive experiments conducted on commonly-used datasets demonstrate the competitiveness of the proposed approaches with respect to state-of-the-art.
“…AdaLBP (Zhao et al 2011) 73.54 Atlases (Guo, Zhao, and Pietikinen 2012) 75.52 ExpLet (Liu et al 2016) 76.65 Dis-ExpLet (Liu et al 2014) 79.00 Lomo (Sikka, Sharma, and Bartlett 2016) 82.10 DTAGN (Jung et al 2015) 81 Table 2 shows a comparison with previously recorded methods(including spatio-temporal and appearance based methods) (Elaiwat, Bennamoun, and Boussaid 2016;Kacem et al 2017;Hu et al 2017;Vielzeuf, Pateux, and Jurie 2017;Yao et al 2016;Ebrahimi Kahou et al 2015). On the other hand, no previously recorded methods are available on the KAIST Face MPMI datasets.…”
Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.
“…This issue necessitates the use of an algorithm based generally on dynamic programming to align different trajectories. Several works including [5], [6], [20] used DTW to align trajectories in a Riemannian manifold; however, this algorithm does not define a proper metric, which is indeed required in the classification phase to define a valid positive-definite kernel. As alternative solution, different works [6], [20], [23] proposed to ignore this constraint by using a variant of SVM with an arbitrary kernel without any restrictions on the kernel function.…”
In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches.Index Terms-Convolutional neural networks, covariance matrix, deep trajectory, facial expression recognition, symmetric positive definite manifold.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.