Abstract-Multimodal attention is a key requirement for humanoid robots in order to navigate in complex environments and act as social, cognitive human partners. To this end, robots have to incorporate attention mechanisms that focus the processing on the potentially most relevant stimuli while controlling the sensor orientation to improve the perception of these stimuli. In this paper, we present our implementation of audio-visual saliency-based attention that we integrated in a system for knowledge-driven audio-visual scene analysis and object-based world modeling. For this purpose, we introduce a novel isophote-based method for proto-object segmentation of saliency maps, a surprise-based auditory saliency definition, and a parametric 3-D model for multimodal saliency fusion. The applicability of the proposed system is demonstrated in a series of experiments.Index Terms-audio-visual saliency, auditory surprise, isophote-based visual proto-objects, parametric 3-D saliency model, object-based inhibition of return, multimodal attention, scene exploration, hierarchical object analysis, overt attention, active perception
The opto-acoustic scene analysis is an extremely important as well as a challenging task for a humanoid robot. By the opto-acoustic scene analysis, the guided and autonomous exploration of the environment by means of acoustic and/or visual perception is meant. On the one hand, the perception ability is necessary to interact with humans in a humanoid way. On the other hand, the proximity of the robot has to be analyzed continuously, in order to enable the robot to fulfill its everyday tasks. Thereby, the greatest challenge lies in the wide variety of different perception tasks, e.g. detection, tracking, and identification of persons and different types of objects. This leads to the need of adapted, both, task- and context-dependent perception modules with specific requirements and abilities. Taking these considerations into account, the paper presents a hierarchical, knowledge-oriented concept of a framework for the opto-acoustic scene analysis. The focus of the work is put on formal conditions on one side and the practical realization of a real-time system on the other side. The proposed framework is modular structured and consists of a number of specialized perception modules. To reflect the knowledge-based structure of the framework, an object-oriented environment model is used for continuous inserting, updating and removing the information about the proximity of the robot. Besides the task of analyzing the scene with the reference to already known objects (and persons1), the proposed concept enables the robot to explore a (partially) unknown environment, with the focus on the creation of multimodal signatures for unknown objects and persons. These signatures are used to build an unique representation of the explored objects and enable the robot to recognize them at a later time
Abstract-We extend our work on an integrated object-based system for saliency-driven overt attention and knowledge-driven object analysis. We present how we can reduce the amount of necessary head movement during scene analysis while still focusing all salient proto-objects in an order that strongly favors proto-objects with a higher saliency. Furthermore, we integrated motion saliency and as a consequence adaptive predictive gaze control to allow for efficient gazing behavior on the ARMAR-III robot head. To evaluate our approach, we first collected a new data set that incorporates two robotic platforms, three scenarios, and different scene complexities. Second, we introduce measures for the effectiveness of active overt attention mechanisms in terms of saliency cumulation and required head motion. This way, we are able to objectively demonstrate the effectiveness of the proposed multicriterial focus of attention selection.
tfaip is a Python-based research framework for developing, structuring, and deploying Deep Learning projects powered by Tensorflow (Abadi et al., 2015) and is intended for scientists of universities or organizations who research, develop, and optionally deploy Deep Learning models. tfaip enables both simple and complex implementation scenarios, such as image classification, object detection, text recognition, natural language processing, or speech recognition. Each scenario is highly configurable by parameters that can directly be modified by the command line or the API.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.