Abstract. The HUMAINE project is concerned with developing interfaces that will register and respond to emotion, particularly pervasive emotion (forms of feeling, expression and action that colour most of human life). The HUMAINE Database provides naturalistic clips which record that kind of material, in multiple modalities, and labelling techniques that are suited to describing it.
Atypical visual behaviour has been recently proposed to account for much of social misunderstanding in autism. Using an eye-tracking system and a gaze-contingent lens display, the present study explores self-monitoring of eye motion in two conditions: free visual exploration and guided exploration via blurring the visual field except for the focal area of vision. During these conditions, thirteen students with High Functioning Autism Spectrum Disorders (HFASD) and fourteen typical individuals were presented naturalistic and interactive social stimuli using virtual reality. Fixation data showed a weaker modulation of eye movements according to the conditions in the HFASD group, thus suggesting impairments in self-monitoring of gaze. Moreover, the gaze-contingent lens induced a visual behaviour whereby social understanding scores were correlated with the time spent gazing at faces. The device could be useful for treating gaze monitoring deficiencies in HFASD.
Abstract. Psychology suggests highly synchronized expressions of emotion across different modalities. Few experiments jointly studied the relative contribution of facial expression and body posture to the overall perception of emotion. Computational models for expressive virtual characters have to consider how such combinations will be perceived by users. This paper reports on two studies exploring how subjects perceived a virtual agent. The first study evaluates the contribution of the facial and postural expressions to the overall perception of basic emotion categories, as well as the valence and activation dimensions. The second study explores the impact of incongruent expressions on the perception of superposed emotions which are known to be frequent in everyday life. Our results suggest that the congruence of facial and bodily expression facilitates the recognition of emotion categories. Yet, judgments were mainly based on the emotion expressed in the face but were nevertheless affected by postures for the perception of the activation dimension.Keywords: evaluation of virtual agents, affective interaction, conversational and non-verbal behavior, multimodal interaction with intelligent virtual agent
is an open access repository that collects the work of Arts et Métiers ParisTech researchers and makes it freely available over the web where possible.
25One of the challenges of designing virtual humans is the definition of appropriate models of the relation between realistic emotions and the coordination of behaviors in several 27 modalities. In this paper, we present the annotation, representation and modeling of multimodal visual behaviors occurring during complex emotions. We illustrate our work 29 using a corpus of TV interviews. This corpus has been annotated at several levels of information: communicative acts, emotion labels, and multimodal signs. We have defined a 31 copy-synthesis approach to drive an Embodied Conversational Agent from these different levels of information. The second part of our paper focuses on a model of complex
33(superposition and masking of) emotions in facial expressions of the agent. We explain how the complementary aspects of our work on corpus and computational model is used 35 to specify complex emotional behaviors.
New technologies drastically change recruitment techniques. Some research projects aim at designing interactive systems that help candidates practice job interviews. Other studies aim at the automatic detection of social signals (e.g. smile, turn of speech, etc...) in videos of job interviews. These studies are limited with respect to the number of interviews they process, but also by the fact that they only analyze simulated job interviews (e.g. students pretending to apply for a fake position). Asynchronous video interviewing tools have become mature products on the human resources market, and thus, a popular step in the recruitment process. As part of a project to help recruiters, we collected a corpus of more than 7000 candidates having asynchronous video job interviews for real positions and recording videos of themselves answering a set of questions. We propose a new hierarchical attention model called HireNet that aims at predicting the hirability of the candidates as evaluated by recruiters. In HireNet, an interview is considered as a sequence of questions and answers containing salient socials signals. Two contextual sources of information are modeled in HireNet: the words contained in the question and in the job position. Our model achieves better F1-scores than previous approaches for each modality (verbal content, audio and video). Results from early and late multimodal fusion suggest that more sophisticated fusion schemes are needed to improve on the monomodal results. Finally, some examples of moments captured by the attention mechanisms suggest our model could potentially be used to help finding key moments in an asynchronous job interview.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.