Hand-Crafted Feature Guided Deep Learning for Facial Expression Recognition

Zeng, Guohang; Zhou, Jiancan; Jia, Xibin; Xie, Weicheng

doi:10.1109/fg.2018.00068

Cited by 48 publications

(21 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several hybrids of deep learning and hand-crafted features based approaches have demonstrated their benefits in edge applications. For example, for facial-expression recognition, [41] propose a new feature loss to embed the information of hand-crafted features into the training process of network, which tries to reduce the difference between hand-crafted features and features learned by the deep neural network. The use of hybrid approaches has also been shown to be advantageous in incorporating data from other sensors on edge nodes.…”

Section: Making Best Use Of Edge Computingmentioning

confidence: 99%

Deep Learning vs. Traditional Computer Vision

Mahony

Campbell

Carvalho

et al. 2019

Advances in Intelligent Systems and Computing

687

383

View full text Add to dashboard Cite

Deep Learning has pushed the limits of what was possible in the domain of Digital Image Processing. However, that is not to say that the traditional computer vision techniques which had been undergoing progressive development in years prior to the rise of DL have become obsolete. This paper will analyse the benefits and drawbacks of each approach. The aim of this paper is to promote a discussion on whether knowledge of classical computer vision techniques should be maintained. The paper will also explore how the two sides of computer vision can be combined. Several recent hybrid methodologies are reviewed which have demonstrated the ability to improve computer vision performance and to tackle problems not suited to Deep Learning. For example, combining traditional computer vision techniques with Deep Learning has been popular in emerging domains such as Panoramic Vision and 3D vision for which Deep Learning models have not yet been fully optimised.

show abstract

Section: Making Best Use Of Edge Computingmentioning

confidence: 99%

Deep Learning vs. Traditional Computer Vision

Mahony

Campbell

Carvalho

et al. 2019

Advances in Intelligent Systems and Computing

687

383

View full text Add to dashboard Cite

show abstract

“…We used CK+ to show that VSHNN works well even for videos in a limited environment. For fair comparison, the numerical values of other methods were quoted exactly from [ 11 , 34 , 62 , 63 ]. We used 10-fold validation for a fair experiment.…”

Section: Resultsmentioning

confidence: 99%

“…There are several datasets for training or verifying video-based FER techniques. For example, the extended Cohn–Kanade (CK+) dataset was collected in a relatively limited environment and used to evaluate the performance of many algorithms such as [ 9 , 10 , 11 ]. As shown in Figure 1 a, CK+ includes videos in which subjects express an artificial emotion.…”

Section: Introductionmentioning

confidence: 99%

Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition

Lee

Kim

Song

2020

Sensors

View full text Add to dashboard Cite

Facial expression recognition (FER) technology has made considerable progress with the rapid development of deep learning. However, conventional FER techniques are mainly designed and trained for videos that are artificially acquired in a limited environment, so they may not operate robustly on videos acquired in a wild environment suffering from varying illuminations and head poses. In order to solve this problem and improve the ultimate performance of FER, this paper proposes a new architecture that extends a state-of-the-art FER scheme and a multi-modal neural network that can effectively fuse image and landmark information. To this end, we propose three methods. To maximize the performance of the recurrent neural network (RNN) in the previous scheme, we first propose a frame substitution module that replaces the latent features of less important frames with those of important frames based on inter-frame correlation. Second, we propose a method for extracting facial landmark features based on the correlation between frames. Third, we propose a new multi-modal fusion method that effectively fuses video and facial landmark information at the feature level. By applying attention based on the characteristics of each modality to the features of the modality, novel fusion is achieved. Experimental results show that the proposed method provides remarkable performance, with 51.4% accuracy for the wild AFEW dataset, 98.5% accuracy for the CK+ dataset and 81.9% accuracy for the MMI dataset, outperforming the state-of-the-art networks.

show abstract

“…Accuracy Hand-crafted feature guided CNN [36] 61.86 AlexNet [37] 64.8 DNNRL [37] 70.6 ResNet [38] 72.4 VGG [38] 72.7 Ensemble of deep networks [39] 73.31 Alignment mapping networks + ensemble [39] 73.73 Single CNN [40] 71.47 Ensemble CNN [40] 73.73 Proposed 73.58…”

Section: Methodsmentioning

confidence: 99%

A Weakly Supervised learning technique for classifying facial expressions

Happy

Dantcheva

Brémond

2019

Pattern Recognition Letters

View full text Add to dashboard Cite

The universal hypothesis suggests that the six basic emotions-anger, disgust, fear, happiness, sadness, and surprise-are being expressed by similar facial expressions by all humans. While existing datasets support the universal hypothesis and comprise of images and videos with discrete disjoint labels of profound emotions, real-life data contains jointly occurring emotions and expressions of different intensities. Models, which are trained using categorical one-hot vectors often over-fit and fail to recognize low or moderate expression intensities. Motivated by the above, as well as by the lack of sufficient annotated data, we here propose a weakly supervised learning technique for expression classification, which leverages the information of unannotated data. Crucial in our approach is that we first train a convolutional neural network (CNN) with label smoothing in a supervised manner and proceed to tune the CNN-weights with both labelled and unlabelled data simultaneously. Experiments on four datasets demonstrate large performance gains in cross-database performance, as well as show that the proposed method achieves to learn different expression intensities, even when trained with categorical samples.

show abstract

Hand-Crafted Feature Guided Deep Learning for Facial Expression Recognition

Cited by 48 publications

References 30 publications

Deep Learning vs. Traditional Computer Vision

Deep Learning vs. Traditional Computer Vision

Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition

A Weakly Supervised learning technique for classifying facial expressions

Contact Info

Product

Resources

About