Abstract:Original citation:Fan, Xijian and Tjahjadi, Tardi. (2015) A spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences. Pattern Recognition, 48 (11 Copies of full items can be used for personal research or study, educational, or not-forprofit purposes without prior permission or charge. Provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content… Show more
“…The tables show the framework using the simple fusion strategy of two features performs better than using individual feature separately, and the proposed fusion strategy achieves the best performance. In Table 6, we compare the proposed feature with the method of Eskil et al [43], the static method of Lucey et al [32] and our previous work [44], which shows the fused feature achieves an average recognition rate of 88.30% for all seven facial expressions, and outperforms the other methods. Thus, we can also conclude that the combination of two dynamic features improves the recognition rate.…”
Section: Resultsmentioning
confidence: 96%
“…Thus, we can also conclude that the combination of two dynamic features improves the recognition rate. We also conducted an experiment on the MMI dataset, comparing the proposed framework with the method that uses LBP and SVM [37], and the methods in [45] and [44] that are evaluated using the same classification strategy of 10-fold cross-validation. The average recognition rates are shown in Table 7.…”
Section: Resultsmentioning
confidence: 99%
“…The table shows that the proposed framework outperforms all the other five methods. The result for LBP was obtained by using different samples to those used in [37], and using the same strategy of classification introduced in [45] which is also used in [44] and the proposed method. Although CK+ and MMI are two of the most widely used datasets for evaluating facial expression recognition methods, they are both collected in a strict controlled settings with near frontal poses, consistent illumination and posed expressions.…”
A dynamic descriptor facilitates robust recognition of facial expressions in video sequences. The current two main approaches to the recognition are basic emotion recognition and recognition based on facial action coding system (FACS) action units. In this paper we focus on basic emotion recognition and propose a spatiotemporal feature based on local Zernike moment in the spatial domain using motion change frequency. We also design a dynamic feature comprising motion history image and entropy. To recognise a facial expression, a weighting strategy based on the latter feature and sub-division of the image frame is applied to the former to enhance the dynamic information of facial expression, and followed by the application of the classical support vector machine. Experiments on the CK+ and MMI datasets using leave-one-out cross validation scheme demonstrate that the integrated framework achieves a better performance than using individual descriptor separately. Compared with six state-of-arts methods, the proposed framework demonstrates a superior performance.
“…The tables show the framework using the simple fusion strategy of two features performs better than using individual feature separately, and the proposed fusion strategy achieves the best performance. In Table 6, we compare the proposed feature with the method of Eskil et al [43], the static method of Lucey et al [32] and our previous work [44], which shows the fused feature achieves an average recognition rate of 88.30% for all seven facial expressions, and outperforms the other methods. Thus, we can also conclude that the combination of two dynamic features improves the recognition rate.…”
Section: Resultsmentioning
confidence: 96%
“…Thus, we can also conclude that the combination of two dynamic features improves the recognition rate. We also conducted an experiment on the MMI dataset, comparing the proposed framework with the method that uses LBP and SVM [37], and the methods in [45] and [44] that are evaluated using the same classification strategy of 10-fold cross-validation. The average recognition rates are shown in Table 7.…”
Section: Resultsmentioning
confidence: 99%
“…The table shows that the proposed framework outperforms all the other five methods. The result for LBP was obtained by using different samples to those used in [37], and using the same strategy of classification introduced in [45] which is also used in [44] and the proposed method. Although CK+ and MMI are two of the most widely used datasets for evaluating facial expression recognition methods, they are both collected in a strict controlled settings with near frontal poses, consistent illumination and posed expressions.…”
A dynamic descriptor facilitates robust recognition of facial expressions in video sequences. The current two main approaches to the recognition are basic emotion recognition and recognition based on facial action coding system (FACS) action units. In this paper we focus on basic emotion recognition and propose a spatiotemporal feature based on local Zernike moment in the spatial domain using motion change frequency. We also design a dynamic feature comprising motion history image and entropy. To recognise a facial expression, a weighting strategy based on the latter feature and sub-division of the image frame is applied to the former to enhance the dynamic information of facial expression, and followed by the application of the classical support vector machine. Experiments on the CK+ and MMI datasets using leave-one-out cross validation scheme demonstrate that the integrated framework achieves a better performance than using individual descriptor separately. Compared with six state-of-arts methods, the proposed framework demonstrates a superior performance.
“…In [19], Xijian and Tjahjadi extracted spatial pyramid histogram of gradients to three-dimensional facial features. They captured both spatial and motion information of facial expression by integrated the extracted features with dense optical flow.…”
Human facial expression is important means of non-verbal communication and conveys a lot more information visually than vocally. In human-machine interaction facial expression recognition plays a vital role. Still facial expression recognition through machines like computer is a difficult task. Face detection, feature extraction and expression classification are the three main stages in the process of Facial Expression Recognition (FER). This survey mainly covers the recent work on FER techniques. It especially focuses on the performance including efficiency and accuracy in face detection, feature extraction and classification methods. Povzetek: V prispevku je predstavljena primerjalna študija tehnik prepoznavanja izrazov obraza.
“…Works that exploit video data focus on the importance of the temporal evolution of the input face. The system proposed by Fan and Tjahjadi [3] processes four sub-regions of the face: forehead, eyes/eyebrows, nose and mouth. They used an extension of the spatial pyramid histogram of gradients and dense optical flow to extract spatial and dynamic features from video sequences, and adopted a multi-class SVM-based classifier with one-to-one strategy to recognise facial expressions.…”
Recognizing facial expressions from static images or video sequences is a widely studied but still challenging problem. The recent progresses obtained by deep neural architectures, or by ensembles of heterogeneous models, have shown that integrating multiple input representations leads to state-of-the-art results. In particular, the appearance and the shape of the input face, or the representations of some face parts, are commonly used to boost the quality of the recognizer. This paper investigates the application of Convolutional Neural Networks (CNNs) with the aim of building a versatile recognizer of expressions in static images that can be further applied to video sequences. We first study the importance of different face parts in the recognition task, focussing on appearance and shape-related features. Then we cast the learning problem in the Semi-Supervised setting, exploiting video data, where only a few frames are supervised. The unsupervised portion of the training data is used to enforce three types of coherence, namely temporal coherence, coherence among the predictions on the face parts and coherence between appearance and shape-based representation. Our experimental analysis shows that coherence constraints can improve the quality of the expression recognizer, thus offering a suitable basis to profitably exploit unsupervised video sequences. Finally we present some examples with occlusions where the shape-based predictor performs better than the appearance one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.