Grounding is an important process that underlies all human interaction. Hence, it is crucial for building social robots that are expected to collaborate effectively with humans. Gaze behavior plays versatile roles in establishing, maintaining and repairing the common ground. Integrating all these roles in a computational dialog model is a complex task since gaze is generally combined with multiple parallel information modalities and involved in multiple processes for the generation and recognition of behavior. Going beyond related work, we present a modeling approach focusing on these multi-modal, parallel and bi-directional aspects of gaze that need to be considered for grounding and their interleaving with the dialog and task management. We illustrate and discuss the different roles of gaze as well as advantages and drawbacks of our modeling approach based on a first user study with a technically sophisticated shared workspace application with a social humanoid robot.
The outcome of interpersonal interactions depends not only on the contents that we communicate verbally, but also on nonverbal social signals. Because a lack of social skills is a common problem for a significant number of people, serious games and other training environments have recently become the focus of research. In this work, we present NovA (Nonverbal behavior Analyzer), a system that analyzes and facilitates the interpretation of social signals automatically in a bidirectional interaction with a conversational agent. It records data of interactions, detects relevant social cues, and creates descriptive statistics for the recorded data with respect to the agent's behavior and the context of the situation. This enhances the possibilities for researchers to automatically label corpora of human-agent interactions and to give users feedback on strengths and weaknesses of their social behavior.
In this paper we present a novel approach to the combined modeling of multimodal fusion and interaction management. The approach is based on a declarative multimodal event logic that allows the integration of inputs distributed over multiple modalities in accordance to spatial, temporal and semantic constraints. In conjunction with a visual state chart language, our approach supports the incremental parsing and fusion of inputs and a tight coupling with interaction management. The incremental and parallel parsing approach allows us to cope with concurrent continuous and discrete interactions and fusion on different levels of abstraction. The high-level visual and declarative modeling methods support rapid prototyping and iterative development of multimodal systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.