This study is the first to report the benefits of spatial covert attention on contrast sensitivity in a wide range of spatial frequencies when a target alone was presented in the absence of a local post-mask. We used a peripheral precue (a small circle indicating the target location) to explore the effects of covert spatial attention on contrast sensitivity as assessed by orientation discrimination (Experiments 1-4), detection (Experiments 2 and 3) and localization (Experiment 3) tasks. In all four experiments the target (a Gabor patch ranging in spatial frequency from 0.5 to 10 cpd) was presented alone in one of eight possible locations equidistant from fixation. Contrast sensitivity was consistently higher for peripherally- than for neutrally-cued trials, even though we eliminated variables (distracters, global masks, local masks, and location uncertainty) that are known to contribute to an external noise reduction explanation of attention. When observers were presented with vertical and horizontal Gabor patches an external noise reduction signal detection model accounted for the cueing benefit in a discrimination task (Experiment 1). However, such a model could not account for this benefit when location uncertainty was reduced, either by: (a) Increasing overall performance level (Experiment 2); (b) increasing stimulus contrast to enable fine discriminations of slightly tilted suprathreshold stimuli (Experiment 3); and (c) presenting a local post-mask (Experiment 4). Given that attentional benefits occurred under conditions that exclude all variables predicted by the external noise reduction model, these results support the signal enhancement model of attention.
When viewing a human face, people often look toward the eyes. Maintaining good eye contact carries significant social value and allows for the extraction of information about gaze direction. When identifying faces, humans also look toward the eyes, but it is unclear whether this behavior is solely a byproduct of the socially important eye movement behavior or whether it has functional importance in basic perceptual tasks. Here, we propose that gaze behavior while determining a person's identity, emotional state, or gender can be explained as an adaptive brain strategy to learn eye movement plans that optimize performance in these evolutionarily important perceptual tasks. We show that humans move their eyes to locations that maximize perceptual performance determining the identity, gender, and emotional state of a face. These optimal fixation points, which differ moderately across tasks, are predicted correctly by a Bayesian ideal observer that integrates information optimally across the face but is constrained by the decrease in resolution and sensitivity from the fovea toward the visual periphery (foveated ideal observer). Neither a model that disregards the foveated nature of the visual system and makes fixations on the local region with maximal information, nor a model that makes center-of-gravity fixations correctly predict human eye movements. Extension of the foveated ideal observer framework to a large database of real-world faces shows that the optimality of these strategies generalizes across the population. These results suggest that the human visual system optimizes face recognition performance through guidance of eye movements not only toward but, more precisely, just below the eyes.natural systems analysis | face processing | saccades D etermining a person's identity, emotional state, and gender is an inherently complex computational problem that has represented a formidable challenge for computer vision systems (1). However, humans demonstrate an impressive ability to perform these tasks (2) accurately within one or two fixations (3) over a large range of spatial scales, head orientations, and lighting. Not surprisingly, the human brain contains areas specialized for the detection and identification of faces (4), as well as for processing their emotional valence (5). While recognizing faces, identifying emotions, or discriminating gender, humans also use a consistent selective sampling of visual information from the eye region and, to a lesser extent, the mouth region through both overt (eye movements) and covert attention mechanisms (6-10). For example, Schyns et al. (8) found that the visual information from the eye region is the main factor determining decisions about a face's identity and gender, whereas Smith et al. (11) found that decisions about a face's emotional valence are driven by both the eye and mouth regions. Furthermore, eye movements have been shown to target the upper face area predominantly. Several studies using long viewing conditions have shown that the eye region attracts t...
Visual search, a vital task for humans and animals, has also become a common and important tool for studying many topics central to active vision and cognition ranging from spatial vision, attention, and oculomotor control to memory, decision making, and rewards. While visual search often seems effortless to humans, trying to recreate human visual search abilities in machines has represented an incredible challenge for computer scientists and engineers. What are the brain computations that ensure successful search? This review article draws on efforts from various subfields and discusses the mechanisms and strategies the brain uses to optimize visual search: the psychophysical evidence, their neural correlates, and if unknown, possible loci of the neural computations. Mechanisms and strategies include use of knowledge about the target, distractor, background statistical properties, location probabilities, contextual cues, scene context, rewards, target prevalence, and also the role of saliency, center-surround organization of search templates, and eye movement plans. I provide overviews of classic and contemporary theories of covert attention and eye movements during search explaining their differences and similarities. To allow the reader to anchor some of the laboratory findings to real-world tasks, the article includes interviews with three expert searchers: a radiologist, a fisherman, and a satellite image analyst.
Recently,quantitative models based on signaldetection theory have been successfully applied to the prediction of human accuracy in visual search for a target that differs from distractors along a single attribute (feature search). The present paper extends these models for visual search accuracy to multidimensional search displays in which the target differs from the distractors along more than one feature dimension (conjunction, disjunction, and triple conjunction displays). The model assumes that each element in the display elicits a noisy representation for each of the relevant feature dimensions. The observer combines the representations across feature dimensions to obtain a single decision variable, and the stimulus with the maximum value determines the response. The model accurately predicts human experimental data on visual search accuracy in conjunctions and disjunctions of contrast and orientation. The model accounts for performance degradation without resorting to a limited-capacity spatially localized and temporally serial mechanism by which to bind information across feature dimensions.Visual search for a target among a set of distractors has been extensively studied by a large number of investigators, Typically, the observer's reaction time for finding the target is measured as a function of the number of distractors (set size) in the display. When the target and the distractors differ along one physical dimension or stimulus attribute (e.g., length, orientation, color, brightness, etc.), the search task is known as a feature search. A comSome of the results of this paper were presented at
Performance finding a target improves when artificial cues direct covert attention to the target's probable location or locations, but how do predictive cues help observers search for objects in real scenes? Controlling for target detectability and retinal eccentricity, we recorded observers' first saccades during search for objects that appeared in expected and unexpected locations within real scenes. As has been found with synthetic images and cues, accuracy of first saccades was significantly higher when the target appeared at an expected location rather than an unexpected location. Observers' saccades with target-absent images make it possible to distinguish two mechanisms that might mediate this effect: limited attentional resources versus differential weighting of information (Bayesian priors). Endpoints of first saccades in target-absent images were significantly closer to the expected than the unexpected locations, a result consistent with the differential-weighting model and inconsistent with limited resources being the sole mechanism underlying the effect.
In general, humans tend to first look just below the eyes when identifying another person. Does everybody look at the same place on a face during identification, and, if not, does this variability in fixation behavior lead to functional consequences? In two conditions, observers had their free eye movements recorded while they performed a face-identification task. In another condition, the same observers identified faces while their gaze was restricted to specific locations on each face. We found substantial differences, which persisted over time, in where individuals chose to first move their eyes. Observers' systematic departure from a canonical, theoretically optimal fixation point did not correlate with performance degradation. Instead, each individual's looking preference corresponded to an idiosyncratic performance-maximizing point of fixation: Those who looked lower on the face performed better when forced to fixate the lower part of the face. The results suggest an observer-specific synergy between the face-recognition and eye movement systems that optimizes face-identification performance.
Models of human visual processing start with an initial stage with parallel independent processing of different physical attributes or features (e.g., color, orientation, motion). A second stage in these models is a temporally serial mechanism (visual attention) that combines or binds information across feature dimensions. Evidence for this serial mechanism is based on experimental results for visual search. I conducted a study of visual search accuracy that carefully controlled for low-level effects: physical similarity of target and distractor, element eccentricity, and eye movements. The larger set-size effects in visual search accuracy for briefly flashed conjunction displays, compared with feature displays, are quantitatively predicted by a simple model in which each feature dimension is processed independently with inherent neural noise and information is combined linearly across feature dimensions. The data are not predicted by a temporally serial mechanism or by a hybrid model with temporally serial and noisy processing. The results do not support the idea that a temporally serial mechanism, visual attention, binds information across feature dimensions and show that the conjunction-feature dichotomy is due to the noisy independent processing of features in the human visual system.
In the Posner cueing paradigm, observers' performance in detecting a target is typically better in trials in which the target is present at the cued location than in trials in which the target appears at the uncued location. This effect can be explained in terms of a Bayesian observer where visual attention simply weights the information differently at the cued (attended) and uncued (unattended) locations without a change in the quality of processing at each location. Alternatively, it could also be explained in terms of visual attention changing the shape of the perceptual filter at the cued location. In this study, we use the classification image technique to compare the human perceptual filters at the cued and uncued locations in a contrast discrimination task. We did not find statistically significant differences between the shapes of the inferred perceptual filters across the two locations, nor did the observed differences account for the measured cueing effects in human observers. Instead, we found a difference in the magnitude of the classification images, supporting the idea that visual attention changes the weighting of information at the cued and uncued location, but does not change the quality of processing at each individual location.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.