Cameras constantly capture and track facial images and videos on cell phones, webcams etc. In the past decade, facial expression classification and recognition has been the topic of interest as facial expression analysis has a wide range of applications such as intelligent tutoring system, systems for psychological studies etc. This study reviews the latest advances in the algorithms and techniques used in distinct phases of real-time facial expression recognition. Though there are state-of-art approaches to address facial expression identification in real-time, many issues such as subjectivity-removal, occlusion, pose, low resolution, scale, variations in illumination level and identification of baseline frame still remain unaddressed. Attempts to deal with such issues for higher accuracy lead to a trade-off in efficiency. Furthermore, the goal of this study is to elaborate on these issues and highlight the solutions provided by the current approaches. This survey has helped the authors to understand that there is a need for a better strategy to address these issues without having to trade-off performance in real-time.
Extensive research effort has been focused on extracting temporal patterns from videos, to improve the accuracy of video classification using a deep neural network based approaches. In this paper, we show that long term dependency patterns may not be enough to achieve sufficient improved results. We propose the Attention-based Spatio-Temporal model (AST) for video classification, which is a self-attention model that learns to attend to spatial features using Convolutional Neural Network (CNN) and temporal features using attention mechanisms. We evaluate our model on motion dependent Action recognition (UCF-101) dataset, facial expression recognition (MMI) dataset, and micro-expression recognition (CASME2) dataset and generated real-life Facial Expression Recognition (FER) dataset and improved by 10%, 4.7% and 5.6% accuracy respectively as compared to state-of-art on the three standard datasets and a synthetic dataset as well.In our research, we performed several experiments for detecting expressions and actions, the AST model plays a vital role in selecting the frames and carry the sequential context in the real-time application as well. We also experimented by extracting the features using the Active shape model (ASM) for FER and found the AST model surpasses other approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.