Guided Spatial Transformers for Facial Expression Recognition

Jiménez, Cristina Luna; Cristóbal-Martín, Jorge; Kleinlein, Ricardo; Gil-Martín, Manuel; Moya, José Manuel Huidobro; Fernández-Martínez, Fernando

doi:10.3390/app11167217

Cited by 11 publications

(5 citation statements)

References 35 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the facial emotion recognizer, we adjusted the weights of the pre-trained STN on the AffectNet dataset to apply transfer-learning strategies. The trained STN for sentiment recognition on AffectNet reached an accuracy of 70.60%, as we can see in [60]. However, applying Feature-Extraction and max.…”

Section: Facial Emotion Recognition Resultsmentioning

confidence: 58%

“…Therefore, we trained the STN again with the same database using seven emotions, the same as RAVDESS except for the 'Calm' emotion. This second model reached an accuracy of 65.90% on the AffectNet database using the same parameters and evaluation strategy as in [60].…”

Section: Facial Emotion Recognition Resultsmentioning

confidence: 94%

“…For this reason, we employed this type of architecture to solve facial emotion recognition too. Specifically, we utilized the guided STN pre-trained model on AffectNet for valence recognition using saliency masks from Luna-Jiménez et al [60] and adapted it to solve facial emotion recognition.…”

Section: Facial Emotion Recognitionmentioning

confidence: 99%

“…The pre-trained facial emotion recognition model was a modified STN that used saliency maps to capture the principal regions of a face, since the performance of these models improved if the STN had access to more processed information about the relevant figures and shapes that appear in an image, as suggested in the previous work of Luna-Jiménez et al [60].…”

Section: Feature Extractionmentioning

confidence: 99%

See 3 more Smart Citations

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Jiménez

Griol

Callejas

et al. 2021

Sensors

Self Cite

View full text Add to dashboard Cite

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

show abstract

Section: Facial Emotion Recognition Resultsmentioning

confidence: 58%

Section: Facial Emotion Recognition Resultsmentioning

confidence: 94%

Section: Facial Emotion Recognitionmentioning

confidence: 99%

Section: Feature Extractionmentioning

confidence: 99%

See 2 more Smart Citations

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Jiménez

Griol

Callejas

et al. 2021

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…Where ( , ) is the target coordinates of the normal grid of the output feature map, ( , ) is the source coordinates of the input feature map that defines the sample points, and is the affine transformation matrix which might take various transformations. Normalized height and width coordinates so that #1 $ % & , % & $ 1 are within the spatial boundaries of the output and #1 $ % ( , % ( $ 1 are within the spatial boundaries of the input [20]. We can take this mechanism and combine it with ResNet, as seen in Fig.…”

Section: B Proposed Methodsmentioning

confidence: 99%

Facial Expression Recognition Using Convolutional Neural Network with Attention Module

Khoirullah

Yudistira²,

Bachtiar³

2022

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

Human Activity Recognition (HAR) is an introduction to human activities that refer to the movements performed by an individual on specific body parts. One branch of HAR is human emotion. Facial emotion is vital in human communication to help convey emotional states and intentions. Facial Expression Recognition (FER) is crucial to understanding how humans communicate. Misinterpreting Facial Expressions can lead to misunderstanding and difficulty reaching a common ground. Deep Learning can help in recognizing these facial expressions. To improve the probation of Facial Expressions Recognition, we propose ResNet attached with an Attention module to push the performance forward. This approach performs better than the standalone ResNet because the localization and sampling grid allows the model to learn how to perform spatial transformations on the input image. Consequently, it improves the model's geometric invariance and picks up the features of the expressions from the human face, resulting in better classification results. This study proves the proposed method with attention is better than without, with a test accuracy of 0.7789 on the FER dataset and 0.8327 on the FER+ dataset. It concludes that the Attention module is essential in recognizing Facial Expressions using a Convolutional Neural Network (CNN). Advice for further research first, add more datasets besides FER and FER+, and second, add a Scheduler to decrease the learning rate during the training data.

show abstract

Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

Mocanu

Țapu

2022

Advances in Visual Computing

View full text Add to dashboard Cite

Guided Spatial Transformers for Facial Expression Recognition

Cited by 11 publications

References 35 publications

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Facial Expression Recognition Using Convolutional Neural Network with Attention Module

Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms

Contact Info

Product

Resources

About