Observers perceive objects in the world as stable over space and time, even though the visual experience of those objects is often discontinuous and distorted due to masking, occlusion, camouflage, or noise. How are we able to easily and quickly achieve stable perception in spite of this constantly changing visual input? It was previously shown that observers experience serial dependence in the perception of features and objects, an effect that extends up to 15 seconds back in time. Here, we asked whether the visual system utilizes an object's prior physical location to inform future position assignments in order to maximize location stability of an object over time. To test this, we presented subjects with small targets at random angular locations relative to central fixation in the peripheral visual field. Subjects reported the perceived location of the target on each trial by adjusting a cursor's position to match its location. Subjects made consistent errors when reporting the perceived position of the target on the current trial, mislocalizing it toward the position of the target in the preceding two trials (Experiment 1). This pull in position perception occurred even when a response was not required on the previous trial (Experiment 2). In addition, we show that serial dependence in perceived position occurs immediately after stimulus presentation, and it is a fast stabilization mechanism that does not require a delay (Experiment 3). This indicates that serial dependence occurs for position representations and facilitates the stable perception of objects in space. Taken together with previous work, our results show that serial dependence occurs at many stages of visual processing, from initial position assignment to object categorization.
Much of the richness of perception is conveyed by implicit, rather than image or feature-level, information. The perception of animacy or lifelikeness of objects, for example, cannot be predicted from image level properties alone. Instead, perceiving lifelikeness seems to be an inferential process and one might expect it to be cognitively demanding and serial rather than fast and automatic. If perceptual mechanisms exist to represent lifelikeness, then observers should be able to perceive this information quickly and reliably, and should be able to perceive the lifelikeness of crowds of objects. Here, we report that observers are highly sensitive to the lifelikeness of random objects and even groups of objects. Observers' percepts of crowd lifelikeness are well predicted by independent observers' lifelikeness judgements of the individual objects comprising that crowd. We demonstrate that visual impressions of abstract dimensions can be achieved with summary statistical representations, which underlie our rich perceptual experience.
The visual system extracts average features from groups of objects (Ariely, 2001; Dakin & Watt, 1997; Watamaniuk & Sekuler, 1992), including high-level stimuli such as faces (Haberman & Whitney, 2007, 2009). This phenomenon, known as ensemble perception, implies a covert process, which would not require fixation of individual stimulus elements. However, some evidence suggests that ensemble perception may instead be a process of averaging foveal input across sequential fixations (Ji, Chen, & Fu, 2013; Jung, Bulthoff, Thornton, Lee, & Armann, 2013). To test directly whether foveating objects is necessary, we measured observers' sensitivity to average facial emotion in the absence of foveal input. Subjects viewed arrays of 24 faces, either in the presence or absence of a gaze-contingent foveal occluder, and adjusted a test face to match the average expression of the array. We found no difference in accuracy between the occluded and non-occluded conditions, demonstrating that foveal input is not required for ensemble perception. Unsurprisingly, without foveal input, subjects spent significantly less time directly fixating faces, but this did not translate into any difference in sensitivity to ensemble expression. Next, we varied the number of faces visible from the set to test whether subjects average multiple faces from the crowd. In both conditions, subjects' performance improved as more faces were presented, indicating that subjects integrated information from multiple faces in the display regardless of whether they had access to foveal information. Our results demonstrate that ensemble perception can be a covert process, not requiring access to direct foveal information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.