Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction

Wu, Jianming; Zhou, Zhiguang; Wang, Yanan; Li, Yang; Xu, Xin; Uchida, Yusuke

doi:10.1145/3340555.3355717

Cited by 9 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This process requires many labour costs, and the persons undertaking primary education may be parents without professional pedagogical knowledge, making it challenging to implement this idea. The development and maturity of AI technology represented by computer vision [YCLC19, GLY*19, Oso19] and natural language processing [ZL22] have enabled computers to accurately perceive and analyse the learning status of each student [WYG*19, WZW*19]. Therefore, the educational idea of teaching students in accordance with their aptitude can be realized.…”

Section: Related Workmentioning

confidence: 99%

Accompany Children's Learning for You: An Intelligent Companion Learning System

Qian

Jiang

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

Nowadays, parents attach importance to their children's primary education but often lack time and correct pedagogical principles to accompany their children's learning. Besides, existing learning systems cannot perceive children's emotional changes. They may also cause children's self‐control and cognitive problems due to smart devices such as mobile phones and tablets. To tackle these issues, we propose an intelligent companion learning system to accompany children in learning English words, namely the Intelligent Augmented Reality Educator (IARE). The IARE realizes the perception and feedback of children's engagement through the intelligent agent (IA) module, and presents the humanized interaction based on projective Augmented Reality (AR). Specifically, IA perceives the children's learning engagement change and spelling status in real‐time through our online lightweight temporal multiple instance attention module and character recognition module, based on which analyses the performance of the individual learning process and gives appropriate feedback and guidance. We allow children to interact with physical letters, thus avoiding the excessive interference of electronic devices. To test the efficacy of our system, we conduct a pilot study with 14 English learning children. The results show that our system can significantly improve children's intrinsic motivation and self‐efficacy.

show abstract

Section: Related Workmentioning

confidence: 99%

Accompany Children's Learning for You: An Intelligent Companion Learning System

Qian

Jiang

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…When training machine learning algorithms to recognize fine-grained emotions using post-stimuli labels, the information on which finegrained instances represent the emotion users labeled post-stimuli is missing. This can lead to overfitting [3], [26], [27] if all the instances are fully-supervised by the post-stimuli labels.…”

Section: Introductionmentioning

confidence: 99%

Weakly-Supervised Learning for Fine-Grained Emotion Recognition Using Physiological Signals

Zhang

Ali

Wang³

et al. 2023

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are costly and time-consuming compared with only collecting one emotion label after the user watched that stimulus (i.e., the post-stimuli emotion labels). To recognize emotions at a finer granularity level when trained with only post-stimuli labels, we propose an emotion recognition algorithm based on Deep Multiple Instance Learning (EDMIL ) using physiological signals. EDMIL recognizes fine-grained valence and arousal (V-A) labels by identifying which instances represent the post-stimuli V-A annotated by users after watching the videos. Instead of fully-supervised training, the instances are weakly-supervised by the post-stimuli labels in the training stage. The V-A of instances are estimated by the instance gains, which indicate the probability of instances to predict the post-stimuli labels. We tested EDMIL on three different datasets, CASE, MERCA and CEAP-360VR, collected in three different environments: desktop, mobile and HMD-based Virtual Reality, respectively. Recognition results validated with the fine-grained V-A self-reports show that for subject-independent 3-class classification (high/neutral/low), EDMIL obtains promising recognition accuracies: 75.63% and 79.73% for V-A on CASE, 70.51% and 67.62% for V-A on MERCA and 65.04% and 67.05% for V-A on CEAP-360VR. Our ablation study shows that all components of EDMIL contribute to both the classification and regression tasks. Our experiments also show that (1) compared with fullysupervised learning, weakly-supervised learning can reduce the problem of overfitting caused by the temporal mismatch between fine-grained annotations and physiological signals, (2) instance segment lengths between 1-2s result in the highest recognition accuracies and (3) EDMIL performs best if post-stimuli annotations consist of less than 30% or more than 60% of the entire video watching.

show abstract

“…This makes the problem nontrivial and subjective because annotators can perceive different engagement levels from the same input video. The reliability of the dataset labels is a big concern in this setting but often is ignored by the current methods [29,30,32]. Because of this, deep learning models overfit to the uncertain samples and perform poorly on validation and test sets.…”

Section: Introductionmentioning

confidence: 99%

Engagement Detection with Multi-Task Training in E-Learning Environments

Copur,

Nakıp,

Scardapane

et al. 2022

Image Analysis and Processing – ICIAP 2022

View full text Add to dashboard Cite

Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-ofthe-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction.

show abstract

Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction

Cited by 9 publications

References 10 publications

Accompany Children's Learning for You: An Intelligent Companion Learning System

Accompany Children's Learning for You: An Intelligent Companion Learning System

Weakly-Supervised Learning for Fine-Grained Emotion Recognition Using Physiological Signals

Engagement Detection with Multi-Task Training in E-Learning Environments

Contact Info

Product

Resources

About