Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.Keywords: eye movements, visual search, context, global feature, Bayesian model According to feature-integration theory (Treisman & Gelade, 1980), the search for objects requires slow serial scanning because attention is necessary to integrate low-level features into single objects. Current computational models of visual attention based on saliency maps have been inspired by this approach, as it allows a simple and direct implementation of bottom-up attentional mechanisms that are not task specific. Computational models of image saliency (Itti, Koch, & Niebur, 1998;Koch & Ullman, 1985;Parkhurst, Law, & Niebur, 2002;Rosenholtz, 1999) provide some predictions about which regions are likely to attract observers' attention. These models work best in situations in which the image itself provides little semantic information and in which no specific task is driving the observer's exploration. In real-world images, the semantic content of the scene, the co-occurrence of objects, and task constraints have been shown to play a key role in modulating where attention and eye movements go
Expanding on the seminal work of G. Buswell (1935) and I. A. Yarbus (1967), we investigated how task instruction influences specific parameters of eye movement control. In the present study, 20 participants viewed color photographs of natural scenes under two instruction sets: visual search and memorization. Results showed that task influenced a number of eye movement measures including the number of fixations and gaze duration on specific objects. Additional analyses revealed that the areas fixated were qualitatively different between the two tasks. However, other measures such as average saccade amplitude and individual fixation durations remained constant across the viewing of the scene and across tasks. The present study demonstrates that viewing task biases the selection of scene regions and aggregate measures of fixation time on those regions but does not influence other measures, such as the duration of individual fixations.
Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.
The size of the perceptual span (or the span of effective vision) in older readers was examined with the moving window paradigm (G. W. McConkie & K. Rayner, 1975). Two experiments demonstrated that older readers have a smaller and more symmetric span than that of younger readers. These 2 characteristics (smaller and more symmetric span) of older readers may be a consequence of their less efficient processing of nonfoveal information, which results in a riskier reading strategy.
What role does the initial glimpse of a scene play in subsequent eye movement guidance? In 4 experiments, a brief scene preview was followed by object search through the scene via a small moving window that was tied to fixation position. Experiment 1 demonstrated that the scene preview resulted in more efficient eye movements compared with a control preview. Experiments 2 and 3 showed that this scene preview benefit was not due to the conceptual category of the scene or identification of the target object in the preview. Experiment 4 demonstrated that the scene preview benefit was unaffected by changing the size of the scene from preview to search. Taken together, the results suggest that an abstract (size invariant) visual representation is generated in an initial scene glimpse and that this representation can be retained in memory and used to guide subsequent eye movements.Keywords: scene perception, eye movements, gaze control, visual search A good deal of information can be acquired from an initial brief glimpse of a real-world scene
In contextual cueing, the position of a target within a group of distractors is learned over repeated exposure to a display with reference to a few nearby items rather than to the global pattern created by the elements. The authors contrasted the role of global and local contexts for contextual cueing in naturalistic scenes. Experiment 1 showed that learned target positions transfer when local information is altered but not when global information is changed. Experiment 2 showed that scene-target covariation is learned more slowly when local, but not global, information is repeated across trials than when global but not local information is repeated. Thus, in naturalistic scenes, observers are biased to associate target locations with global contexts.Keywords: contextual cueing, scenes, visual search, visual attention, visual memory Repeated exposure to a specific arrangement of target and distractor items leads to a progressively more efficient search, an effect called contextual cueing (e.g., Chua & Chun, 2003;Chun & Jiang, 1998Jiang & Chun, 2001;Olson & Chun, 2002). For example, in their seminal work on this effect, Chun and Jiang (1998) had observers search for a rotated T hidden among rotated Ls. Over the course of trials, a subset of stimuli was consistently repeated with the arrangement of the target and distractor elements fixed. Across multiple repetitions, search times for repeated displays became faster than those for novel displays. This effect occurred without observers being aware that displays were repeated and without observers having explicit memory of target positions. Similar findings have been obtained with arrays of novel two-and three-dimensional (3D) shapes (Chua & Chun, 2003;Chun & Jiang, 1999).Recently, Brockmole and Henderson (2006; see also Brockmole & Henderson, in press) completed the first investigation of contextual cueing in which real-world scenes constituted the learning context. Like artificial stimulus arrays, real-world scenes have stable structures . For example, each time we go to our neighborhood park, we recognize the athletic fields, playground equipment, and pavilions as the same objects and features arranged in the same spatial configuration. Even objects that can be moved appear in regular spatial arrangements; strollers are often lined up near the benches, and kites are in the air. In an examination of how regularities within real-world environments are used to guide visual attention to behaviorally relevant targets, observers were given the task of searching for and identifying a target letter arbitrarily embedded in scene photographs. Although search times across novel scenes remained constant throughout the experiment, search times for letters appearing in a consistent position within repeated scenes decreased across repetitions. With real-world scenes, however, memory for scenetarget covariation was explicit; observers recognized repeated scenes more often than those that were presented once and displayed superior recall of target position within the repeated s...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.