Facial Expression Recognition in Videos: An CNN-LSTM based Model for Video Classification

Abdullah, Muhammad; Ahmad, Mobeen; Han, Dongil

doi:10.1109/iceic49074.2020.9051332

Cited by 24 publications

(13 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After this replacement, the accuracy of the video classifier is 56.8%. This is in line with state-of-the-art results in the literature on emotion recognition from RAVDESS videos, namely 57.5% with Synchronous Graph Neural Networks (8 emotions) [50]; 61% with ConvNet-LSTM (8 emotions) [1]; 59% with an RNN (7 emotions) [9], and 82.4% with stacked autoencoders (6 emotions) [5].…”

Section: A Dataset and Model Architecturesupporting

confidence: 88%

“…V, has been studied by other authors in-the-clear, i.e. without regards for privacy protection, using a variety of deep learning architectures, with reported accuracies in the 57%-82% range, depending on the number of emotion classes included in the study (6 to 8) [5], [50], [9], [1]. The ConvNet model that we trained for our experimental results in Sec.…”

Section: Related Workmentioning

confidence: 99%

“…On line 3, these selected rows are expanded again into a secret-shared tensor F of size n × h × w × c that holds the selected frames. F [1], F [2], . .…”

Section: A Oblivious Frame Selectionmentioning

confidence: 99%

See 2 more Smart Citations

Privacy-Preserving Video Classification with Convolutional Neural Networks

Pentyala¹,

Dowsley²,

Cock³

2021

Preprint

View full text Add to dashboard Cite

Many video classification applications require access to personal data, thereby posing an invasive security risk to the users' privacy. We propose a privacy-preserving implementation of single-frame method based video classification with convolutional neural networks that allows a party to infer a label from a video without necessitating the video owner to disclose their video to other entities in an unencrypted manner. Similarly, our approach removes the requirement of the classifier owner from revealing their model parameters to outside entities in plaintext.To this end, we combine existing Secure Multi-Party Computation (MPC) protocols for private image classification with our novel MPC protocols for oblivious single-frame selection and secure label aggregation across frames. The result is an end-to-end privacy-preserving video classification pipeline. We evaluate our proposed solution in an application for private human emotion recognition. Our results across a variety of security settings, spanning honest and dishonest majority configurations of the computing parties, and for both passive and active adversaries, demonstrate that videos can be classified with state-of-the-art accuracy, and without leaking sensitive user information.

show abstract

Section: A Dataset and Model Architecturesupporting

confidence: 88%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Privacy-Preserving Video Classification with Convolutional Neural Networks

Pentyala¹,

Dowsley²,

Cock³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…RAVDESS is classbalanced except the neutral class, which was elicited 50% less time than the other emotion classes. We adapted two crossvalidation settings following the methods [42], [48], [27], [28], [13], [72], [44], [53], [12], [52]. The first setting considers the identities of the actors such that the training (validation) and the corresponding testing k-folds have no overlap in terms of actors (shown as actor-split= hereafter).…”

Section: A Datasets and Evaluation Metricsmentioning

confidence: 99%

“…The majority of works mainly concentrated on unimodal learning of emotions [11], [12], [13], i.e., processing a single modality. Although there exist breakthrough achievements by unimodal emotion recognition, due to the aforementioned multimodal nature of emotion expression, such models remain incapable in some circumstances.…”

Section: Introductionmentioning

confidence: 99%

Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

Franceschini¹,

Fini²,

Beyan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Emotion recognition is involved in several real-world applications. With an increase in available modalities, automatic understanding of emotions is being performed more accurately. The success in Multimodal Emotion Recognition (MER), primarily relies on the supervised learning paradigm. However, data annotation is expensive, time-consuming, and as emotion expression and perception depends on several factors (e.g., age, gender, culture) obtaining labels with a high reliability is hard. Motivated by these, we focus on unsupervised feature learning for MER. We consider discrete emotions, and as modalities text, audio and vision are used. Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature. Our end-to-end feature learning approach has several differences (and advantages) compared to existing MER methods: i) it is unsupervised, so the learning is lack of data labelling cost; ii) it does not require data spatial augmentation, modality alignment, large number of batch size or epochs; iii) it applies data fusion only at inference; and iv) it does not require backbones pre-trained on emotion recognition task. The experiments on benchmark datasets show that our method outperforms several baseline approaches and unsupervised learning methods applied in MER. Particularly, it even surpasses a few supervised MER state-of-the-art.

show abstract