Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.Keywords: eye movements, visual search, context, global feature, Bayesian model According to feature-integration theory (Treisman & Gelade, 1980), the search for objects requires slow serial scanning because attention is necessary to integrate low-level features into single objects. Current computational models of visual attention based on saliency maps have been inspired by this approach, as it allows a simple and direct implementation of bottom-up attentional mechanisms that are not task specific. Computational models of image saliency (Itti, Koch, & Niebur, 1998;Koch & Ullman, 1985;Parkhurst, Law, & Niebur, 2002;Rosenholtz, 1999) provide some predictions about which regions are likely to attract observers' attention. These models work best in situations in which the image itself provides little semantic information and in which no specific task is driving the observer's exploration. In real-world images, the semantic content of the scene, the co-occurrence of objects, and task constraints have been shown to play a key role in modulating where attention and eye movements go
Expanding on the seminal work of G. Buswell (1935) and I. A. Yarbus (1967), we investigated how task instruction influences specific parameters of eye movement control. In the present study, 20 participants viewed color photographs of natural scenes under two instruction sets: visual search and memorization. Results showed that task influenced a number of eye movement measures including the number of fixations and gaze duration on specific objects. Additional analyses revealed that the areas fixated were qualitatively different between the two tasks. However, other measures such as average saccade amplitude and individual fixation durations remained constant across the viewing of the scene and across tasks. The present study demonstrates that viewing task biases the selection of scene regions and aggregate measures of fixation time on those regions but does not influence other measures, such as the duration of individual fixations.
Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.
The size of the perceptual span (or the span of effective vision) in older readers was examined with the moving window paradigm (G. W. McConkie & K. Rayner, 1975). Two experiments demonstrated that older readers have a smaller and more symmetric span than that of younger readers. These 2 characteristics (smaller and more symmetric span) of older readers may be a consequence of their less efficient processing of nonfoveal information, which results in a riskier reading strategy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.