We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source’s position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot’s mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system’s performance and discuss possible areas of application.
We investigated audiovisual interactions in motion perception by behavioral experiments testing both, the influence of visual stimuli on auditory apparent motion and the influence of auditory stimuli on visual apparent motion perception. A set of loudspeakers with an LED mounted in the middle of each speaker cone was arranged in a semicircle. Apparent motion streams were presented for each modality alone in the unimodal conditions. In the bimodal conditions, stimuli of the second modality were added to fill the temporal and spatial gaps of the sampled trajectory of the reference stream. The participants' task was to observe the quasi-naturalistic stimulus sequences and to perform a standard classification. The addition of stimuli of the second modality indeed facilitated apparent motion perception. Bimodal presentation increased the upper temporal interval up to which the stimuli could be separated in time while still being perceived as continuous motion. We interpret these results as evidence for an ecologically advantageous audiovisual motion integration mechanism which operates beyond the constraints of strict spatiotemporal coincidence. Functional considerations suggest that this mechanism may represent an amodal stage suited for the processing of both unimodal and bimodal signals.
We introduce an information-driven scene classification system that combines different types of knowledge derived from a domain ontology and a statistical model in order to analyze scenes based on recognized objects. The domain ontology structures and formalizes which kind of scene classes exist and which object classes occur in them. Based on this structure, an empirical analysis of annotations from the LabelMe image database results in a statistical domain description. Both forms of knowledge are utilized for determining which object class detector to apply to the current scene according to the principle of maximum information gain. All evidence is combined in a belief-based framework that explicitly takes into account the uncertainty inherent to the statistical model and the object detection process as well as the ignorance associated with the coarse granularity of ontological constraints. Finally, we present preliminary classification performance results for scenes from the LabelMe database.
We conducted behavioral experiments on visual, auditory, and motor contributions to the human representation of space in virtual reality environments using an ‘impossible-worlds paradigm’. The experiments were run with an omnidirectional locomotion input device, the ‘Virtusphere’, which is a rotatable 10-foot hollow sphere that allows a subject inside to walk in any direction for any distance, while immersed in a virtual environment. Both the rotation of the sphere and the movement of a subject’s head were tracked to process the subject’s view within the virtual environment presented on a head-mounted display. Auditory features were dynamically processed in order to exactly align sound sources and visual objects. Using this experimental setup the subjects were presented with ‘impossible worlds’, i.e., virtual environments with geometrical and topological properties, which are physically not possible. In previous experiments we have shown that subjects are able to navigate inside these impossible worlds (Zetzsche et al., 2009), despite the fact that different interpretations of their spatial structure are in conflict, since there is no single (physically plausible) interpretation accounting for all sensory perceptions of the subjects.In the present study we manipulated these physically ‘impossible’ properties either in the visual or in the auditory domain (so that each modality supports one of the possible interpretations) and assessed how these manipulations affected the subjects’ internal representations of space. We discuss our results with respect to auditory, visual, and motor contributions to the internal spatial representation, the interaction of modalities, and the implication on the notion of motor action as a linking element between the senses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.