We describe an implemented system which automatically generates and animates conversations between multiple human-like agents with appropriate and synchronized speech, intonation, facial expressions, and hand gestures. Conversations are created by a dialogue planner that produces the text as well as the intonation of the utterances. The speaker/listener relationship, the text, and the intonation in turn drive facial expressions, lip motions, eye gaze, head motion, and arm gesture generators. Coordinated arm, wrist, and hand motions are invoked to create semantically meaningful gestures. Throughout, we will use examples from an actual synthesized, fully animated conversation.
Until now theories of the gesture-speech relationship have been difficult to evaluate because of their descriptive basis. In this paper we provide a tool for investigating the relationship between speech and gesture: a system that generates speech, intonation, and gesture using two copies of an identical program that have different knowledge of the world and must cooperate to accomplish a goal. The output of the dialogue generation is fed into a three-dimensional interactive animated model-two graphic figures on a computer screen who gesture according to the rules given to the system. The advantage of computer modeling in this domain is that it forces us to come up with predictive theories of the gesture-speech relationship. A felicitous outcome is a working system to realize autonomous animated conversational agents, for virtual reality and other purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.