Kai Nickel scite author profile

Absfmcf-In this paper we present our ongoing work in building technologis for natural multimodal human-mbot interaction. We present our systems for spontmeous speech r d t i o n , multimodal dialogue processing and visual pcrception of a user, which Includes the recognition of pointing gestures as well as the recognition ora person's head orientstion. Each of the components are described in the paper and experimental resultr are presented. In order to demonstrate and measure the usefulness of such technologies for humanrobot interaction, all components have been integrated on B mobile mho1 platform and have been used for real-time human-robot interaction in a kitchen scenario.

show abstract

Head pose estimation using stereo vision for human-robot interaction

Seemann

Nickel²,

Stiefelhagen

View full text Add to dashboard Cite

Enabling Multimodal Human–Robot Interaction for the Karlsruhe Humanoid Robot

Stiefelhagen¹,

Ekenel²,

Fügen³

et al. 2007

IEEE Trans. Robot.

114

View full text Add to dashboard Cite

In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the recognition of a person's head orientation. Each of the components is described in the paper and experimental results are presented. We also present several experiments on multimodal human-robot interaction, such as interaction using speech and gestures, the automatic determination of the addressee during human-human-robot interaction, as well on interactive learning of dialogue strategies. The work and the components presented here constitute the core building blocks for audiovisual perception of humans and multimodal human-robot interaction used for the humanoid robot developed within the German research project (Sonderforschungsbereich) on humanoid cooperative robots.

show abstract

Pointing gesture recognition based on 3D-tracking of face, hands and head orientation

Nickel

Stiefelhagen

2003

104

View full text Add to dashboard Cite

Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Holzapfel

Nickel

Stiefelhagen

2004

View full text Add to dashboard Cite

This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.

show abstract

A joint particle filter for audio-visual speaker tracking

Nickel

Gehrig

Stiefelhagen

et al. 2005

View full text Add to dashboard Cite

In this paper, we present a novel approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multiview face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. Computationally expensive features are evaluated only at the particles' projected positions in the respective camera images, thus the complexity of the proposed algorithm is low. We evaluated the system on data that was recorded during actual lectures. The results of our experiments were 36 cm average error for video only tracking, 46 cm for audio only, and 31 cm for the combined audio-video system.

show abstract

Kalman filters for audio-video source localization

Gehrig

Nickel²,

Ekenel

et al.

View full text Add to dashboard Cite

In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival. We found that such a scheme provided superior tracking quality as compared with the conventional closed-form approximation methods. In this work, we enhance our audio localizer with video information. We propose an algorithm to incorporate detected face positions in different camera views into the Kalman filter without doing any explicit triangulation. This approach yields a robust source localizer that functions reliably both for segments wherein the speaker is silent, which would be detrimental for an audio only tracker, and wherein many faces appear, which would confuse a video only tracker. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the audio-video localizer functioned better than a localizer based solely on audio or solely on video features.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kai Nickel

Visual recognition of pointing gestures for human–robot interaction

Natural human-robot interaction using speech, head pose and gestures

Head pose estimation using stereo vision for human-robot interaction

Enabling Multimodal Human–Robot Interaction for the Karlsruhe Humanoid Robot

Pointing gesture recognition based on 3D-tracking of face, hands and head orientation

Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

A joint particle filter for audio-visual speaker tracking

Kalman filters for audio-video source localization

Contact Info

Product

Resources

About