Gaze is an important non-verbal cue involved in many facets of social interactions like communication, attentiveness or attitudes. Nevertheless, extracting gaze directions visually and remotely usually suffers large errors because of low resolution images, inaccurate eye cropping, or large eye shape variations across the population, amongst others. This paper hypothesizes that these challenges can be addressed by exploiting multimodal social cues for gaze model adaptation on top of an head-pose independent 3D gaze estimation framework. First, a robust eye cropping refinement is achieved by combining a semantic face model with eye landmark detections. Investigations on whether temporal smoothing can overcome instantaneous refinement limitations is conducted. Secondly, to study whether social interaction convention could be used as priors for adaptation, we exploited the speaking status and head pose constraints to derive soft gaze labels and infer person-specific gaze bias using robust statistics. Experimental results on gaze coding in natural interactions from two different settings demonstrate that the two steps of our gaze adaptation method contribute to reduce gaze errors by a large margin over the baseline and can be generalized to several identities in challenging scenarios.
The strong interest children show for mobile robots makes these devices potentially powerful to teach programming. Moreover, the tangibility of physical objects and the sociability of interacting with them are added benefits. A key skill that novices in programming have to acquire is the ability to mentally trace program execution. However, because of their embodied and real-time nature, robots make the mental tracing of program execution difficult.To address this difficulty, in this paper we propose an automatic program evaluation framework based on a robot simulator. We describe a real-time implementation providing feedback and gamified hints to students.In a user study, we demonstrate that our hint system increases the percentage of students writing correct programs from 50 % to 96 %, and decreases the average time to write a correct program by 30 %. However, we could not show any correlation between the use of the system and the performance of students on a questionnaire testing concept acquisition. This suggests that programming skills and concept understanding are different abilities.Overall, the clear performance gain shows the value of our approach for programming education using robots.
Recognizing eye movements is important for gaze behavior understanding like in human communication analysis (human-human or robot interactions) or for diagnosis (medical, reading impairments). In this paper, we address this task using remote RGB-D sensors to analyze people behaving in natural conditions. This is very challenging given that such sensors have a normal sampling rate of 30 Hz and provide low-resolution eye images (typically 36x60 pixels), and natural scenarios introduce many variabilities in illumination, shadows, head pose, and dynamics. Hence gaze signals one can extract in these conditions have lower precision compared to dedicated IR eye trackers, rendering previous methods less appropriate for the task. To tackle these challenges, we propose a deep learning method that directly processes the eye image video streams to classify them into fixation, saccade, and blink classes, and allows to distinguish irrelevant noise (illumination, low-resolution artifact, inaccurate eye alignment, difficult eye shapes) from true eye motion signals. Experiments on natural 4-party interactions demonstrate the benefit of our approach compared to previous methods, including deep learning models applied to gaze outputs. CCS CONCEPTS • Computing methodologies → Tracking; Neural networks; Activity recognition and understanding.
Eye gaze and facial expressions are central to face-to-face social interactions. These behavioral cues and their connections to first impressions have been widely studied in psychology and computing literature, but limited to a single situation. Utilizing ubiquitous multimodal sensors coupled with advances in computer vision and machine learning, we investigate the connections between these behavioral cues and perceived soft skills in two diverse workplace situations (job interviews and reception desk). Pearson's correlation analysis shows a moderate connection between certain facial expressions, eye gaze cues and perceived soft skills in job interviews (r ∈ [−30, 30]) and desk (r ∈ [20, 36]) situations. Results of our computational framework to infer perceived soft skills indicates a low predictive power of eye gaze, facial expressions, and their combination in both interviews (R 2 ∈ [0.02, 0.21]) and desk (R 2 ∈ [0.05, 0.15]) situations. Our work has important implications for employee training and behavioral feedback systems.
Gaze estimation allows robots to better understand users and thus to more precisely meet their needs. In this paper, we are interested in gaze sensing for analyzing collaborative tasks and manipulation behaviors in human-robot interactions (HRI), which differs from screen gazing and other communicative HRI settings. Our goal is to study the accuracy that remote vision gaze estimators can provide, as they are a promising alternative to current accurate but intrusive wearable sensors. In this view, our contributions are: 1) we collected and make public a labeled dataset involving manipulation tasks and gazing behaviors in an HRI context; 2) we evaluate the performance of a state-of-the-art gaze estimation system on this dataset. Our results show a low default accuracy, which is improved by calibration, but that more research is needed if one wishes to distinguish gazing at one object amongst a dozen on a table. CCS CONCEPTS • Computing methodologies → Neural networks; Activity recognition and understanding; Tracking.
Gaze estimation is a difficult task, even for humans. However, as humans, we are good at understanding a situation and exploiting it to guess the expected visual focus of attention of people, and we usually use this information to retrieve people’s gaze. In this article, we propose to leverage such situation-based expectation about people’s visual focus of attention to collect weakly labeled gaze samples and perform person-specific calibration of gaze estimators in an unsupervised and online way. In this context, our contributions are the following: (i) we show how task contextual attention priors can be used to gather reference gaze samples, which is a cumbersome process otherwise; (ii) we propose a robust estimation framework to exploit these weak labels for the estimation of the calibration model parameters; and (iii) we demonstrate the applicability of this approach on two human-human and human-robot interaction settings, namely conversation and manipulation. Experiments on three datasets validate our approach, providing insights on the priors effectiveness and on the impact of different calibration models, particularly the usefulness of taking head pose into account.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.