Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network

Thomas, Chinchu; Nair, Nitin; Jayagopi, Dinesh Babu

doi:10.1145/3242969.3264984

Cited by 32 publications

(19 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the proposed clip-level method (see Section 5.2), adding the affect features to the behavioral features reduces MSE (second row of Table 7), showing the effectiveness of the affect states in engagement level regression. After adding affect features, the MSE of the proposed method is very close to [38] and [39]. [17] 0.1000 DFSTN [7] 0.0736 body-pose features + LSTM [45] 0.0717 eye, head-pose, and AUs features + TCN [39] 0.0655 eye, head-pose, and AUs features + GRU [38] 0.0671 As can be observed in Figure 6 (d), the behavioral features of the two videos in classes 2 and 3 are different from the video in class 1.…”

Section: Tablementioning

confidence: 86%

“…Different from the end-to-end approaches, in feature-based approaches, first, multi-modal handcrafted features are extracted from videos, and then the features are fed to a classifier or regressor to output engagement [6], [7], [8], [10], [11], [12], [16], [17], [38], [39], [40], [41], [42], [43], [44], [45]. Table 1 summarizes the literature of feature-based video engagement measurement approaches focusing on their features, machine-learning models, and datasets.…”

Section: Feature-based Video Engagement Measurementmentioning

confidence: 99%

“…In addition, in the case of high engagement, the person's eye gaze direction is towards visual content. Accordingly, inspired by previous research [6], [7], [8], [10], [11], [12], [16], [17], [38], [39], [40], [41], [42], [43], [44], [45], eye location, head pose, and eye gaze direction in consecutive video frames are considered as other informative behavioral features.…”

Section: Behavioral Featuresmentioning

confidence: 99%

“…The EmotiW dataset contains videos of around 5-minute length. The videos are divided into 10-seond clips with 50% overlap [39], and video-level features, described above, are extracted from each clip. The sequence of clip-level features are analyzed by a two-layer unidirectional LSTM with 37 × 128 and 128 × 64 layers, followed by a fully-connected layer at its final time step with 64 × 1 neurons for regression in the EmotiW dataset.…”

Section: Experimental Settingmentioning

confidence: 99%

See 3 more Smart Citations

Affect-driven Ordinal Engagement Measurement from Video

Abedi¹,

Khan²

2021

Preprint

View full text Add to dashboard Cite

In education and intervention programs, person's engagement has been identified as a major factor in successful program completion. Automatic measurement of person's engagement provides useful information for instructors to meet program objectives and individualize program delivery. In this paper, we present a novel approach for video-based engagement measurement in virtual learning programs. We propose to use affect states, continuous values of valence and arousal extracted from consecutive video frames, along with a new latent affective feature vector and behavioral features for engagement measurement. Deep learning-based temporal, and traditional machine-learning-based non-temporal models are trained and validated on frame-level, and video-level features, respectively. In addition to the conventional centralized learning, we also implement the proposed method in a decentralized federated learning setting and study the effect of model personalization in engagement measurement. We evaluated the performance of the proposed method on the only two publicly available video engagement measurement datasets, DAiSEE and EmotiW, containing videos of students in online learning programs. Our experiments show a state-of-the-art engagement level classification accuracy of 63.3% and correctly classifying disengagement videos in the DAiSEE dataset and a regression mean squared error of 0.0673 on the EmotiW dataset. Our ablation study shows the effectiveness of incorporating affect states in engagement measurement. We interpret the findings from the experimental results based on psychology concepts in the field of engagement.

show abstract

Section: Tablementioning

confidence: 86%

Section: Feature-based Video Engagement Measurementmentioning

confidence: 99%

Section: Behavioral Featuresmentioning

confidence: 99%

Section: Experimental Settingmentioning

confidence: 99%

See 2 more Smart Citations

Affect-driven Ordinal Engagement Measurement from Video

Abedi¹,

Khan²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Multimodal learning analytics has been the subject of increasing attention in recent years and has shown significant promise for modeling learning and engagement across a range of educational contexts [2,27,30,31,35,36]. For example, Sümer et al examined learner engagement using pose estimation and facial expression data in school classrooms [35].…”

Section: Multimodal Learning Analyticsmentioning

confidence: 99%

What's Fair is Fair: Detecting and Mitigating Encoded Bias in Multimodal Models of Museum Visitor Attention

Acosta

Henderson

Rowe

et al. 2021

Proceedings of the 2021 International Conference on Multimodal Interaction

View full text Add to dashboard Cite

Recent years have seen growing interest in modeling visitor engagement in museums with multimodal learning analytics. In parallel, there has also been growing concern about issues of fairness and encoded bias in machine learning models. In this paper, we investigate bias detection and mitigation techniques to address issues of algorithmic fairness in multimodal models of museum visitor visual attention. We employ slicing analysis using the Absolute Between-ROC Area (ABROCA) statistic to detect encoded bias present in multimodal models of visitor visual attention trained with facial expression and posture data from visitor interactions with a game-based museum exhibit about environmental sustainability. We investigate instances of gender bias that arise between different combinations of modalities across several machine learning techniques. We also measure the effectiveness of two different debiasing strategies-learned fair representations and reweighing-when applied to the trained multimodal visitor attention models. Results indicate that patterns of bias can arise across different modality combinations for the different visitor visual attention models, and there is often an inherent tradeoff between predictive accuracy and ABROCA. Analyses suggest that debiasing strategies tend to be more effective on multimodal models of visitor visual attention than their unimodal counterparts CCS CONCEPTS• Applied computing → Education; • Computing methodologies → Machine Learning.

show abstract

Engagement Detection with Multi-Task Training in E-Learning Environments

Copur,

Nakıp,

Scardapane

et al. 2022

Image Analysis and Processing – ICIAP 2022

View full text Add to dashboard Cite

Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-ofthe-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction.

show abstract

Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network

Cited by 32 publications

References 12 publications

Affect-driven Ordinal Engagement Measurement from Video

Affect-driven Ordinal Engagement Measurement from Video

What's Fair is Fair: Detecting and Mitigating Encoded Bias in Multimodal Models of Museum Visitor Attention

Engagement Detection with Multi-Task Training in E-Learning Environments

Contact Info

Product

Resources

About