A robot's head is important both for directional sensors and, in human-directed robotics, as the single most visible interaction interface. However, designing a robot's head faces contradicting requirements when integrating powerful sensing with social expression. Furher, reactions of the general public show that current head designs often cause negative user reactions and distract from the functional capabilities.Therefore, this contribution presents a novel anthropomorphic robot head called "Flobi", which combines state-of-the-art sensing functionality with an exterior that elicits a sympathetic emotional response. It can display primary and secondary emotions in a human-like way, to enable intuitive human-robotinteraction. To facilitate further research on facial appearance, the exterior is fully modular and replaceable.While current state-of-the-art still requires trade-offs when integrating sensing and social expression, Flobi has been designed to enable service robotic applications, with highresolution, wide-angle stereo vision, gyroscope motion compensation and stereo audio. For ease of integration, the head is selfcontained, including 18 actuators, sensors and control boards, all in a human-head sized package.
A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure, we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. Results of applying our method to three data sets in a variety of conditions demonstrate that, from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval.
Abstract-If robots are to succeed in novel tasks, they must be able to learn from humans. To improve such humanrobot interaction, a system is presented that provides dialog structure and engages the human in an exploratory teaching scenario. Thereby, we specifically target untrained users, who are supported by mixed-initiative interaction using verbal and non-verbal modalities. We present the principles of dialog structuring based on an object learning and manipulation scenario. System development is following an interactive evaluation approach and we will present both an extensible, eventbased interaction architecture to realize mixed-initiative and evaluation results based on a video-study of the system. We show that users benefit from the provided dialog structure to result in predictable and successful human-robot interaction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.