Abstract-SocialSignal Processing is the research domain aimed at bridging the social intelligence gap between humans and machines. This article is the first survey of the domain that jointly considers its three major aspects, namely modeling, analysis and synthesis of social behaviour. Modeling investigates laws and principles underlying social interaction, analysis explores approaches for automatic understanding of social exchanges recorded with different sensors, and synthesis studies techniques for the generation of social behaviour via various forms of embodiment. For each of the above aspects, the paper includes an extensive survey of the literature, points to the most important publicly available resources, and outlines the most fundamental challenges ahead.
We aim at creating an expressive Embodied Conversational Agent (ECA) and address the problem of synthesizing expressive agent gestures. In our previous work, we have described the gesture selection process. In this paper, we present a computational model of gesture quality. Once a certain gesture has been chosen for execution, how can we modify it to carry a desired expressive content while retaining its original semantics? We characterize bodily expressivity with a small set of dimensions derived from a review of psychology literature. We provide a detailed description of the implementation of these dimensions in our animation system, including our gesture modeling language. We also demonstrate animations with different expressivity settings in our existing ECA system. Finally, we describe two user studies we undertook to evaluate the appropriateness of our implementation for each dimension of expressivity as well as the potential of combining these dimensions to create expressive gestures that reflect communicative intent.
This paper describes the results of a research project aimed at implementing a 'realistic' 3D Embodied Agent that can be animated in real-time and is 'believable and expressive': that is, able to communicate with coherency complex information, through the combination and the tight synchronisation of verbal and nonverbal signals. We describe, in particular, how we `animate' this Agent (that we called Greta) so as to enable her to manifest the affective states that are dynamically activated and de-activated in her mind during the dialog with the user. The system is made up of three tightly interrelated components: -a representation of the Agent Mind: this includes long and short-term affective components (personality and emotions) and simulates how emotions are triggered and decay over time according to the Agent's personality and to the context and how several emotions may overlap. Dynamic belief networks with weighting of goals is the formalism we employ to this purpose; -a mark-up language to denote the communicative meanings that may be associated with dialog moves performed by the Agent; -a translation of the Agent's tagged move into a face expression, that combines appropriately the available channels (gaze direction, eyebrow shape, head direction and movement etc). The final output is a 3-D facial model that respects the MPEG-4 standard and uses MPEG-4 Facial Animation Parameters to produce facial expressions. Throughout the paper, we illustrate the results obtained, with an example of dialog in the domain of 'Advice about eating disorders'. The paper concludes with an analysis of advantages of our cognitive model of emotion triggering and of the problems found in testing it. Although we did not yet complete a formal evaluation of our system, we briefly describe how we plan to assess the agent's believability in terms of consistency of its communicative behavior.
This paper reports results from a program that produces high quality animation of facial expressions and head movements as automatically as possible in conjunction with meaning-based speech synthesis, including spoken intonation. The goal of the research is as much to test and define our theories of the formal semantics for such gestures, as to produce convincing animation. Towards this end we have produced a high level programming language for 3D animation of facial expressions. We have been concerned primarily with expressions conveying information correlated with the intonation of the voice: this includes the differences of timing, pitch, and emphasis that are related to such semantic distinctions of discourse as "focus", "topic" and "comment", "theme" and "rheme", or "given" and "new" information. We are also interested in the relation of affect or emotion to facial expression. Until now, systems have not embodied such rule-governed translation from spoken utterance meaning to facial expressions. Our system embodies rules that describe and coordinate these relations: intonation/information, intonation/affect and facial expressions/affect. A meaning representation includes discourse information: what is contrastive/background information in the given context, and what is the "topic" or "theme" of the discourse. The system maps the meaning representation into how accents and their placement are chosen, how they are conveyed over facial expression and how speech and facial expressions are coordinated. This determines a sequence of functional groups: lip shapes, conversational signals, punctuators, regulators or manipulators. Our algorithms then impose synchrony, create coarticulation effects, and determine affectual signals, eye and head movements. The lowest level representation is the Facial Action Coding System (FACS), which makes the generation system portable to other facial models. Disciplines Computer Sciences | Engineering | Graphics and Human Computer InterfacesThis journal article is available at ScholarlyCommons: http://repository.upenn.edu/hms/192 Generating Facial Expressions for Speech 1 AbstractThis paper reports results from a program that produces high quality animation of facial expressions and head movements as automatically as possible in conjunction with meaning-based speech synthesis, including spoken intonation. The goal of the research is as much to test and define our theories of the formal semantics for such gestures, as to produce convincing animation. Towards this end we have produced a high level programming language for 3D animation of facial expressions. We have been concerned primarily with expressions conveying information correlated with the intonation of the voice: this includes the differences of timing, pitch, and emphasis that are related to such semantic distinctions of discourse as "focus", "topic" and "comment", "theme" and "rheme", or "given" and "new" information. We are also interested in the relation of affect or emotion to facial expression. Until now, syste...
Since the beginning of the SAIBA effort to unify key interfaces in the multi-modal behavior generation process, the Behavior Markup Language (BML) has both gained ground as an important component in many projects worldwide, and continues to undergo further refinement. This paper reports on the progress made in the last year in further developing BML. It discusses some of the key challenges identified that the effort is facing, and reviews a number of projects that already are making use of BML or support its use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.