<p>Humans and many animals can selectively sample important parts of their visual surroundings to carry out their daily activities like foraging or finding prey or mates. Selective attention allows them to efficiently use the limited resources of the brain by deploying sensory apparatus to collect data believed to be pertinent to the organism's current task in hand. Robots or other computational agents operating in dynamic environments are similarly exposed to a wide variety of stimuli, which they must process with limited sensory and computational resources. Developing computational models of visual attention has long been of interest as such models enable artificial systems to select necessary information from complex and cluttered visual environments, hence reducing the data-processing burden. Biologically inspired computational saliency models have previously been used in selectively sampling a visual scene, but these have limited capacity to deal with dynamic environments and have no capacity to reason about uncertainty when planning their visual scene sampling strategy. These models typically select contrast in colour, shape or orientation as salient and sample locations of a visual scene in descending order of salience. After each observation, the area around the sampled location is blocked using inhibition of return mechanism to keep it from being re-visited. This thesis generalises the traditional model of saliency by using an adaptive Kalman filter estimator to model an agent's understanding of the world and uses a utility function based approach to describe what the agent cares about in the visual scene. This allows the agents to adopt a richer set of perceptual strategies than is possible with the classical winner-take-all mechanism of the traditional saliency model. In contrast with the traditional approach, inhibition of return is achieved without implementing an extra mechanism on top of the underlying structure. This thesis demonstrates the use of five utility functions that are used to encapsulate the perceptual state that is valued by the agent. Each utility function thereby produces a distinct perceptual behaviour that is matched to particular scenarios. The resulting visual attention distribution of the five proposed utility functions is demonstrated on five real-life videos. In most of the experiments, pixel intensity has been used as the source of the saliency map. As the proposed approach is independent of the saliency map used, it can be used with other existing more complex saliency map building models. Moreover, the underlying structure of the model is sufficiently general and flexible, hence it can be used as the base of a new range of more sophisticated gaze control systems.</p>
When faced with a complicated visual scene many animals including humans attend to important regions in a systematic serial manner. The ability to orient rapidly towards an important region in a scene allows an organism to accomplish activities, such as navigation, foraging and detecting possible prey/mates. Developing a computational model of visual attention has long been of interest as such models enable artificial systems to acquire information efficiently from complex and cluttered environments. Current computational models attend to an important region (usually one which is maximally different from its immediate neighbours) and then inhibits future viewing of that region in order to facilitate distribution of visual attention. In this work we introduce the idea of an 'uncertainty map', which works in conjunction with the existing idea of 'saliency map' to drive the system's attention. We demonstrate the distribution of visual attention by our model in simulation. We show that despite its simplicity, our system distributes visual attention in a context-dependent manner which can be easily tuned to different environments.
<p>Humans and many animals can selectively sample important parts of their visual surroundings to carry out their daily activities like foraging or finding prey or mates. Selective attention allows them to efficiently use the limited resources of the brain by deploying sensory apparatus to collect data believed to be pertinent to the organism's current task in hand. Robots or other computational agents operating in dynamic environments are similarly exposed to a wide variety of stimuli, which they must process with limited sensory and computational resources. Developing computational models of visual attention has long been of interest as such models enable artificial systems to select necessary information from complex and cluttered visual environments, hence reducing the data-processing burden. Biologically inspired computational saliency models have previously been used in selectively sampling a visual scene, but these have limited capacity to deal with dynamic environments and have no capacity to reason about uncertainty when planning their visual scene sampling strategy. These models typically select contrast in colour, shape or orientation as salient and sample locations of a visual scene in descending order of salience. After each observation, the area around the sampled location is blocked using inhibition of return mechanism to keep it from being re-visited. This thesis generalises the traditional model of saliency by using an adaptive Kalman filter estimator to model an agent's understanding of the world and uses a utility function based approach to describe what the agent cares about in the visual scene. This allows the agents to adopt a richer set of perceptual strategies than is possible with the classical winner-take-all mechanism of the traditional saliency model. In contrast with the traditional approach, inhibition of return is achieved without implementing an extra mechanism on top of the underlying structure. This thesis demonstrates the use of five utility functions that are used to encapsulate the perceptual state that is valued by the agent. Each utility function thereby produces a distinct perceptual behaviour that is matched to particular scenarios. The resulting visual attention distribution of the five proposed utility functions is demonstrated on five real-life videos. In most of the experiments, pixel intensity has been used as the source of the saliency map. As the proposed approach is independent of the saliency map used, it can be used with other existing more complex saliency map building models. Moreover, the underlying structure of the model is sufficiently general and flexible, hence it can be used as the base of a new range of more sophisticated gaze control systems.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.