Abstract-This work presents a multimodal bottom-up attention system for the humanoid robot iCub where the robot's decisions to move eyes and neck are based on visual and acoustic saliency maps. We introduce a modular and distributed software architecture which is capable of fusing visual and acoustic saliency maps into one egocentric frame of reference. This system endows the iCub with an emergent exploratory behavior reacting to combined visual and auditory saliency. The developed software modules provide a flexible foundation for the open iCub platform and for further experiments and developments, including higher levels of attention and representation of the peripersonal space.
To follow a goal-directed behavior, an autonomous agent must be able to acquire knowledge about the causality between its motor actions and corresponding sensory feedback. Since the complexity of such sensorimotor relationships directly influences required cognitive resources, this work proposes that it is of importance to keep the agent's sensorimotor relationships simple. This implies that the agent should be designed in a way such that sensory consequences can be described and predicted in a simplified manner. Living organisms implement this paradigm by adapting sensory and motor systems specifically to their behavior and environment. As a result, they are able to predict sensorimotor consequences with a strongly limited amount of (expensive) nervous tissue. In this context, the present work proposes that advantageous artificial sensory and motor layouts can be evolved by rewarding the ability to predict self-induced stimuli through simple sensorimotor relationships. Experiments consider a simulated agent recording realistic visual stimuli from natural images. The obtained results demonstrate the ability of the proposed method to (i) synthesize visual sensorimotor structures adapted to an agent's environment and behavior, and (ii) serve as a computational model for testing hypotheses regarding the development of biological visual sensorimotor systems.
Neural circuits that route motor activity to sensory structures play a fundamental role in perception. Their purpose is to aid basic cognitive processes by integrating knowledge about an organism's actions and to predict the perceptual consequences of those actions. This work develops a biologically inspired model of a visual stimulus prediction circuit and proposes a mathematical formulation for a computational implementation. We consider an agent with a visual sensory area consisting of an unknown rigid configuration of light-sensitive receptive fields which move with respect to the environment and according to a given number of degrees of freedom. From the agent's perspective, every movement induces an initially unknown change to the recorded stimulus. In line with evidence collected from studies on ontogenetic development and the plasticity of neural circuits, the proposed model adapts its structure with respect to experienced stimuli collected during the execution of a set of exploratory actions. We discuss the tendency of the proposed model to organize such that the prediction function is built using a particularly sparse feedforward network which requires a minimum amount of wiring and computational operations. We also observe a dualism between the organization of an intermediate layer of the network and the concept of self-similarity.Index Terms-Corollary discharge, plasticity, reafference, selfsimilarity, sparse neural networks, visual stimulus prediction.
This work adds the concept of object to an existent low-level attention system of the humanoid robot iCub. The objects are defined as clusters of SIFT visual features. When the robot first encounters an unknown object, found to be within a certain (small) distance from its eyes, it stores a cluster of the features present within an interval about that distance, using depth perception. Whenever a previously stored object crosses the robot's field of view again, it is recognized, mapped into an egocentrical frame of reference, and gazed at. This mapping is persistent, in the sense that its identification and position are kept even if not visible by the robot. Features are stored and recognized in a bottom-up way. Experimental results on the humanoid robot iCub validate this approach. This work creates the foundation for a way of linking the bottom-up attention system with top-down, object-oriented information provided by humans.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.