Abstract:A dynamic descriptor facilitates robust recognition of facial expressions in video sequences. The current two main approaches to the recognition are basic emotion recognition and recognition based on facial action coding system (FACS) action units. In this paper we focus on basic emotion recognition and propose a spatiotemporal feature based on local Zernike moment in the spatial domain using motion change frequency. We also design a dynamic feature comprising motion history image and entropy. To recognise a f… Show more
“…For instance, Liu et al [29] present an expressionletbased spatio-temporal manifold descriptor which shows the superiority over traditional methods on FER tasks. Fan and Tjahjadi [30] provide a spatio-temporal feature based on local Zernike moment and motion history image for dynamic FER. Yan [31] proposes a collaborative discriminative multi-metric learning for FER in video sequences.…”
Section: A Hand-designed Feature-based Methodsmentioning
One key challenging issues of facial expression recognition (FER) in video sequences is to extract discriminative spatiotemporal video features from facial expression images in video sequences. In this paper, we propose a new method of FER in video sequences via a hybrid deep learning model. The proposed method first employs two individual deep convolutional neural networks (CNNs), including a spatial CNN processing static facial images and a temporal CN network processing optical flow images, to separately learn high-level spatial and temporal features on the divided video segments. These two CNNs are fine-tuned on target video facial expression datasets from a pre-trained CNN model. Then, the obtained segment-level spatial and temporal features are integrated into a deep fusion network built with a deep belief network (DBN) model. This deep fusion network is used to jointly learn discriminative spatiotemporal features. Finally, an average pooling is performed on the learned DBN segment-level features in a video sequence, to produce a fixed-length global video feature representation. Based on the global video feature representations, a linear support vector machine (SVM) is employed for facial expression classification tasks. The extensive experiments on three public video-based facial expression datasets, i.e., BAUM-1s, RML, and MMI, show the effectiveness of our proposed method, outperforming the state-of-the-arts. INDEX TERMS Facial expression recognition, spatio-temporal features, hybrid deep learning, deep convolutional neural networks, deep belief network.
“…For instance, Liu et al [29] present an expressionletbased spatio-temporal manifold descriptor which shows the superiority over traditional methods on FER tasks. Fan and Tjahjadi [30] provide a spatio-temporal feature based on local Zernike moment and motion history image for dynamic FER. Yan [31] proposes a collaborative discriminative multi-metric learning for FER in video sequences.…”
Section: A Hand-designed Feature-based Methodsmentioning
One key challenging issues of facial expression recognition (FER) in video sequences is to extract discriminative spatiotemporal video features from facial expression images in video sequences. In this paper, we propose a new method of FER in video sequences via a hybrid deep learning model. The proposed method first employs two individual deep convolutional neural networks (CNNs), including a spatial CNN processing static facial images and a temporal CN network processing optical flow images, to separately learn high-level spatial and temporal features on the divided video segments. These two CNNs are fine-tuned on target video facial expression datasets from a pre-trained CNN model. Then, the obtained segment-level spatial and temporal features are integrated into a deep fusion network built with a deep belief network (DBN) model. This deep fusion network is used to jointly learn discriminative spatiotemporal features. Finally, an average pooling is performed on the learned DBN segment-level features in a video sequence, to produce a fixed-length global video feature representation. Based on the global video feature representations, a linear support vector machine (SVM) is employed for facial expression classification tasks. The extensive experiments on three public video-based facial expression datasets, i.e., BAUM-1s, RML, and MMI, show the effectiveness of our proposed method, outperforming the state-of-the-arts. INDEX TERMS Facial expression recognition, spatio-temporal features, hybrid deep learning, deep convolutional neural networks, deep belief network.
“…The face segmentation model, based on geometric information, defines the most appropriate layout for extracting face features in order to recognize expression. Assuming that face regions are well aligned; histogram-like features are often computed from equal-sized face grids [25]. However, apparent misalignment can be observed, primarily caused by face deformations induced by the expression itself.…”
In this paper, we develop a new method that recognizes facial expressions, on the basis of an innovative Local Motion Patterns (LMP) feature. The LMP feature analyzes locally the motion distribution in order to separate consistent mouvement patterns from noise. Indeed, facial motion extracted from the face is generally noisy and without specific processing, it can hardly cope with expression recognition requirements especially for micro-expressions. Direction and magnitude statistical profiles are jointly analyzed in order to filter out noise. This work presents three main contributions. The first one is the analysis of the face skin temporal elasticity and face deformations during expression. The second one is a unified approach for both macro and micro expression recognition leading the way to supporting a wide range of expression intensities. The third one is the step forward towards in-the-wild expression recognition, dealing with challenges such as various intensity and various expression activation patterns, illumination variations and small head pose variations. Our method outperforms state-of-the-art methods for micro expression recognition and positions itself among top-ranked state-of-the-art methods for macro expression recognition.
“…Thus, such models are robust against head pose variation and registration errors. The efforts of [5,22,52], and [53] involve the division of face into multiple blocks and encoding face as the concatenation of individual block representation. In FACS, each facial muscle movements was associated with an AU and the combination of certain AUs was considered for expression classification.…”
Facial expression recognition aims to accurately interpret facial muscle movements in affective states (emotions). Previous studies have proposed holistic analysis of the face, as well as the extraction of features pertained only to specific facial regions towards expression recognition. While classically the latter have shown better performances, we here explore this in the context of deep learning. In particular, this work provides a performance comparison of holistic and part-based deep learning models for expression recognition. In addition, we showcase the effectiveness of skip connections, which allow a network to infer from both low and high-level feature maps. Our results suggest that holistic models outperform part-based models, in the absence of skip connections. Finally, based on our findings, we propose a data augmentation scheme, which we incorporate in a part-based model. The proposed multi-face multi-part (MFMP) model leverages the wide information from part-based data augmentation, where we train the network using the facial parts extracted from different face samples of the same expression class. Extensive experiments on publicly available datasets show a significant improvement of facial expression classification with the proposed MFMP framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.