Research in affective computing and cognitive science has shown the importance of emotional facial and vocal expressions during human-computer and human-human interactions. But, while models exist to control the display and interactive dynamics of emotional expressions, such as smiles, in embodied agents, these techniques can not be applied to video interactions between humans. In this work, we propose an audiovisual smile transformation algorithm able to manipulate an incoming video stream in real-time to parametrically control the amount of smile seen on the user's face and heard in their voice, while preserving other characteristics such as the user's identity or the timing and content of the interaction. The transformation is composed of separate audio and visual pipelines, both based on a warping technique informed by real-time detection of audio and visual landmarks. Taken together, these two parts constitute a unique audiovisual algorithm which, in addition to providing simultaneous real-time transformations of a real person's face and voice, allows to investigate the integration of both modalities of smiles in real-world social interactions.
This paper presents two methods for the first Micro-Expression Spotting Challenge 2019 by evaluating local temporal pattern (LTP) and local binary pattern (LBP) on two most recent databases, i.e. SAMM and CAS(ME) 2 . First we propose LTP-ML method as the baseline results for the challenge and then we compare the results with the LBP-χ 2distance method. The LTP patterns are extracted by applying PCA in a temporal window on several facial local regions. The micro-expression sequences are then spotted by a local classification of LTP and a global fusion. The LBP-χ 2 -distance method is to compare the feature difference by calculating χ 2 distance of LBP in a time window, the facial movements are then detected with a threshold. The performance is evaluated by Leave-One-Subject-Out cross validation. The overlap frames are used to determine the True Positives and the metric F1-score is used to compare the spotting performance of the databases. The F1-score of LTP-ML result for SAMM and CAS(ME) 2 are 0.0316 and 0.0179, respectively. The results show our proposed LTP-ML method outperformed LBP-χ 2 -distance method in terms of F1-score on both databases.
As various databases of facial expressions have been made accessible over the last few decades, the Facial Expression Recognition (FER) task has gotten a lot of interest. The multiple sources of the available databases raised several challenges for facial recognition task. These challenges are usually addressed by Convolution Neural Network (CNN) architectures. Different from CNN models, a Transformer model based on attention mechanism has been presented recently to address vision tasks. One of the major issue with Transformers is the need of a large data for training, while most FER databases are limited compared to other vision applications. Therefore, we propose in this paper to learn a vision Transformer jointly with a Squeeze and Excitation (SE) block for FER task. The proposed method is evaluated on different publicly available FER databases including CK+, JAFFE, RAF-DB and SFEW. Experiments demonstrate that our model outperforms state-of-the-art methods on CK+ and SFEW and achieves competitive results on JAFFE and RAF-DB.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.