Facilitating multiparty dialog with gaze, gesture, and speech

Bohus, Dan; Horvitz, Eric

doi:10.1145/1891903.1891910

Cited by 123 publications

(77 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early work on developing architectures to manage this problem considers how nonverbal cues used by virtual agents on a screen can affect perception of lifelikeness (Cassell & Thorisson, 1999). More recent work on engagement with virtual agents uses more elaborate turn-taking models and supports multiparty conversation (Bohus & Horvitz, 2010). Research in spoken dialog systems also attempts to control the timing of turn-taking over the single modality of speech (Raux & Eskenazi, 2009).…”

Section: Related Workmentioning

confidence: 99%

Timing in Multimodal Turn-Taking Interactions: Control and Analysis Using Timed Petri Nets

Chao

Thomaz

2012

JHRI

View full text Add to dashboard Cite

Turn-taking interactions with humans are multimodal and reciprocal in nature. In addition, the timing of actions is of great importance, as it influences both social and task strategies. To enable the precise control and analysis of timed discrete events for a robot, we develop a system for multimodal collaboration based on a timed Petri net (TPN) representation. We also argue for action interruptions in reciprocal interaction and describe its implementation within our system. Using the system, our autonomously operating humanoid robot Simon collaborates with humans through both speech and physical action to solve the Towers of Hanoi, during which the human and the robot take turns manipulating objects in a shared physical workspace. We hypothesize that action interruptions have a positive impact on turn-taking and evaluate this in the Towers of Hanoi domain through two experimental methods. One is a between-groups user study with 16 participants. The other is a simulation experiment using 200 simulated users of varying speed, initiative, compliance, and correctness. In these experiments, action interruptions are either present or absent in the system. Our collective results show that action interruptions lead to increased task efficiency through increased user initiative, improved interaction balance, and higher sense of fluency. In arriving at these results, we demonstrate how these evaluation methods can be highly complementary in the analysis of interaction dynamics.

show abstract

Section: Related Workmentioning

confidence: 99%

Timing in Multimodal Turn-Taking Interactions: Control and Analysis Using Timed Petri Nets

Chao

Thomaz

2012

JHRI

View full text Add to dashboard Cite

show abstract

“…Next, it asks the user for an order (action ask_order), which is followed by the act of listening for the order (expectation order(X)). Once the order has been provided, the robot consults the multi-DOA estimation module if any DOAs were detected during the act of listening (expectation dirs(As)), filtering them for consistency 3 . Depending upon the number of consistent DOAs detected, one of the following situations may be triggered: a) if no consistent DOAs were detected ( situation A([])), it accepts the order and asks whether the user wants something else, but it does not face the user; b) if only one consistent DOA was detected (situation A ([A])), it accepts the order, faces the user and asks the user whether he/she wants something else; or c) if more than one consistent DOA was detected (situation G(As)), it rejects the order, adds the DOAs to Ps (action push([A,B,...], Ps)), tells the users to speak one at a time, and returns to the initial situation to retake the order while providing Ps as an argument, which results in the robot facing each consistent DOA and taking an order for each one.…”

Section: :G Oto(d)mentioning

confidence: 99%

Integration of the Multi-DOA Estimation Functionality to Human-Robot Interaction

Rascón

Meza

Fuentes

et al. 2015

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

Sound source localization is important in human interaction, such as in locating the origin of long-distance calls or facing other humans while in a conversation. It is of interest to apply such functionality to the core of human-robot interaction (HRI) and investigate its benefits, if any. In this paper, we propose three strategies for how to integrate the functionality of multiple directions-of-arrival (multi-DOA) estimation with a common scenario, in which the robot acts as a waiter while applying audio source localization. The proposed strategies are: a) the robot locates calls from users at a relatively long distance; b) the robot faces the user when taking the order; and c) the robot announces whether the acoustic environment is not conducive to understanding a speech command (mainly where more than one user speaks at once). It was seen that users react favourably to the functionality, and that it even has a noticeable influence on the success of the interaction.

show abstract

“…Bohus and Horvitz [5] developed a system capable of differentiating speakers in a turn-based speaking environment. The system was able to determine who was speaking to whom by evaluating hand gestures and other cues.…”

Section: Related Workmentioning

confidence: 99%

A Multi-modal Approach for Natural Human-Robot Interaction

et al. 2012

View full text Add to dashboard Cite

Abstract. We present a robot that is able to interact with people in a natural, multi-modal way by using both speech and gesture. The robot is able to track people, process speech and understand language. To track people and recognize gestures, the robot uses an RGB-D sensor (e.g., a Microsoft Kinect). To recognize speech, the robot uses a cloudbased service. To understand language, the robot uses a probabilistic graphical model to infer the meaning of a natural language query. We have evaluated our system in two domains. The first domain is a robot receptionist (roboceptionist); we show that the roboceptionist is able to interact successfully with people 77% of the time when people are primed with the capabilities of the robot compared to 57% when people are not primed with its capabilities. The second domain is a mobile service robot, which is able to interact with people via natural language.

show abstract

Facilitating multiparty dialog with gaze, gesture, and speech

Cited by 123 publications

References 18 publications

Timing in Multimodal Turn-Taking Interactions: Control and Analysis Using Timed Petri Nets

Timing in Multimodal Turn-Taking Interactions: Control and Analysis Using Timed Petri Nets

Integration of the Multi-DOA Estimation Functionality to Human-Robot Interaction

A Multi-modal Approach for Natural Human-Robot Interaction

Contact Info

Product

Resources

About