“…This process requires many labour costs, and the persons undertaking primary education may be parents without professional pedagogical knowledge, making it challenging to implement this idea. The development and maturity of AI technology represented by computer vision [YCLC19, GLY*19, Oso19] and natural language processing [ZL22] have enabled computers to accurately perceive and analyse the learning status of each student [WYG*19, WZW*19]. Therefore, the educational idea of teaching students in accordance with their aptitude can be realized.…”
Nowadays, parents attach importance to their children's primary education but often lack time and correct pedagogical principles to accompany their children's learning. Besides, existing learning systems cannot perceive children's emotional changes. They may also cause children's self‐control and cognitive problems due to smart devices such as mobile phones and tablets. To tackle these issues, we propose an intelligent companion learning system to accompany children in learning English words, namely the Intelligent Augmented Reality Educator (IARE). The IARE realizes the perception and feedback of children's engagement through the intelligent agent (IA) module, and presents the humanized interaction based on projective Augmented Reality (AR). Specifically, IA perceives the children's learning engagement change and spelling status in real‐time through our online lightweight temporal multiple instance attention module and character recognition module, based on which analyses the performance of the individual learning process and gives appropriate feedback and guidance. We allow children to interact with physical letters, thus avoiding the excessive interference of electronic devices. To test the efficacy of our system, we conduct a pilot study with 14 English learning children. The results show that our system can significantly improve children's intrinsic motivation and self‐efficacy.
“…This process requires many labour costs, and the persons undertaking primary education may be parents without professional pedagogical knowledge, making it challenging to implement this idea. The development and maturity of AI technology represented by computer vision [YCLC19, GLY*19, Oso19] and natural language processing [ZL22] have enabled computers to accurately perceive and analyse the learning status of each student [WYG*19, WZW*19]. Therefore, the educational idea of teaching students in accordance with their aptitude can be realized.…”
Nowadays, parents attach importance to their children's primary education but often lack time and correct pedagogical principles to accompany their children's learning. Besides, existing learning systems cannot perceive children's emotional changes. They may also cause children's self‐control and cognitive problems due to smart devices such as mobile phones and tablets. To tackle these issues, we propose an intelligent companion learning system to accompany children in learning English words, namely the Intelligent Augmented Reality Educator (IARE). The IARE realizes the perception and feedback of children's engagement through the intelligent agent (IA) module, and presents the humanized interaction based on projective Augmented Reality (AR). Specifically, IA perceives the children's learning engagement change and spelling status in real‐time through our online lightweight temporal multiple instance attention module and character recognition module, based on which analyses the performance of the individual learning process and gives appropriate feedback and guidance. We allow children to interact with physical letters, thus avoiding the excessive interference of electronic devices. To test the efficacy of our system, we conduct a pilot study with 14 English learning children. The results show that our system can significantly improve children's intrinsic motivation and self‐efficacy.
“…When training machine learning algorithms to recognize fine-grained emotions using post-stimuli labels, the information on which finegrained instances represent the emotion users labeled post-stimuli is missing. This can lead to overfitting [3], [26], [27] if all the instances are fully-supervised by the post-stimuli labels.…”
Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are costly and time-consuming compared with only collecting one emotion label after the user watched that stimulus (i.e., the post-stimuli emotion labels). To recognize emotions at a finer granularity level when trained with only post-stimuli labels, we propose an emotion recognition algorithm based on Deep Multiple Instance Learning (EDMIL ) using physiological signals. EDMIL recognizes fine-grained valence and arousal (V-A) labels by identifying which instances represent the post-stimuli V-A annotated by users after watching the videos. Instead of fully-supervised training, the instances are weakly-supervised by the post-stimuli labels in the training stage. The V-A of instances are estimated by the instance gains, which indicate the probability of instances to predict the post-stimuli labels. We tested EDMIL on three different datasets, CASE, MERCA and CEAP-360VR, collected in three different environments: desktop, mobile and HMD-based Virtual Reality, respectively. Recognition results validated with the fine-grained V-A self-reports show that for subject-independent 3-class classification (high/neutral/low), EDMIL obtains promising recognition accuracies: 75.63% and 79.73% for V-A on CASE, 70.51% and 67.62% for V-A on MERCA and 65.04% and 67.05% for V-A on CEAP-360VR. Our ablation study shows that all components of EDMIL contribute to both the classification and regression tasks. Our experiments also show that (1) compared with fullysupervised learning, weakly-supervised learning can reduce the problem of overfitting caused by the temporal mismatch between fine-grained annotations and physiological signals, (2) instance segment lengths between 1-2s result in the highest recognition accuracies and (3) EDMIL performs best if post-stimuli annotations consist of less than 30% or more than 60% of the entire video watching.
“…This makes the problem nontrivial and subjective because annotators can perceive different engagement levels from the same input video. The reliability of the dataset labels is a big concern in this setting but often is ignored by the current methods [29,30,32]. Because of this, deep learning models overfit to the uncertain samples and perform poorly on validation and test sets.…”
Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-ofthe-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.