Facial expression recognition (FER) is a very challenging problem in computer vision. Although extensive research has been conducted to improve FER performance in recent years, there is still room for improvement. A common goal of FER is to classify a given face image into one of seven emotion categories: angry, disgust, fear, happy, neutral, sad, and surprise. In this paper, we propose to use a simple multi-layer perceptron (MLP) classifier that determines whether the current classification result is reliable or not. If the current classification result is determined as unreliable, we use the given face image as a query to search for similar images. In particular, facial action units are used to retrieve the images with a similar facial expression. Then, another MLP is trained to predict final emotion category by aggregating classification output vectors of the query image and its retrieved similar images. Experimental results on FER2013 dataset demonstrate that the performance of the state-of-the-art networks can be further improved by our proposed method. INDEX TERMSConvolutional neural networks, facial expression recognition, image retrieval, facial action units.
This paper deals with the problem of training convolutional neural networks (CNNs) with facial action units (AUs). In particular, we focus on the imbalance problem of the training datasets for facial emotion classification. Since training a CNN with an imbalanced dataset tends to yield a learning bias toward the major classes and eventually leads to deterioration in the classification accuracy, it is required to increase the number of training images for the minority classes to have evenly distributed training images over all classes. However, it is difficult to find the images with a similar facial emotion for the oversampling. In this paper, we propose to use the AU features to retrieve an image with a similar emotion. The query selection from the minority class and the AU-based retrieval processes repeat until the numbers of training data over all classes are balanced. Also, to improve the classification accuracy, the AU features are fused with the CNN features to train a support vector machine (SVM) for final classification. The experiments have been conducted on three imbalanced facial image datasets, RAF-DB, FER2013, and ExpW. The results demonstrate that the CNNs trained with the AU features improve the classification accuracy by 3%-4%.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.