Bayesian cue combination models have been used to examine how human observers combine information from several cues to form estimates of linear quantities like depth. Here we develop an analogous theory for circular quantities like planar direction. The circular theory is broadly similar to the linear theory but differs in significant ways. First, in the circular theory the combined estimate is a nonlinear function of the individual cue estimates. Second, in the circular theory the mean of the combined estimate is affected not only by the means of individual cues and the weights assigned to individual cues but also by the variability of individual cues. Third, in the circular theory the combined estimate can be less certain than the individual estimates, if the individual estimates disagree with one another. Fourth, the circular theory does not have some of the closed-form expressions available in the linear theory, so data analysis requires numerical methods. We describe a vector sum model that gives a heuristic approximation to the circular theory's behavior. We also show how the theory can be extended to deal with spherical quantities like direction in three-dimensional space.
Every biological or artificial visual system faces the problem that images are highly ambiguous, in the sense that every image depicts an infinite number of possible 3D arrangements of shapes, surface colors, and light sources. When estimating 3D shape from shading, the human visual system partly resolves this ambiguity by relying on the light-from-above prior, an assumption that light comes from overhead. However, light comes from overhead only on average, and most images contain visual information that contradicts the light-from-above prior, such as shadows indicating oblique lighting. How does the human visual system perceive 3D shape when there are contradictions between what it assumes and what it sees? Here we show that the visual system combines the light-from-above prior with visual lighting cues using an efficient statistical strategy that assigns a weight to the prior and to the cues and finds a maximum-likelihood lighting direction estimate that is a compromise between the two. The prior receives surprisingly little weight and can be overridden by lighting cues that are barely perceptible. Thus, the light-from-above prior plays a much more limited role in shape perception than previously thought, and instead human vision relies heavily on lighting cues to recover 3D shape. These findings also support the notion that the visual system efficiently integrates priors with cues to solve the difficult problem of recovering 3D shape from 2D images.
All images are highly ambiguous, and to perceive 3-D scenes, the human visual system relies on assumptions about what lighting conditions are most probable. Here we show that human observers' assumptions about lighting diffuseness are well matched to the diffuseness of lighting in real-world scenes. We use a novel multidirectional photometer to measure lighting in hundreds of environments, and we find that the diffuseness of natural lighting falls in the same range as previous psychophysical estimates of the visual system's assumptions about diffuseness. We also find that natural lighting is typically directional enough to override human observers' assumption that light comes from above. Furthermore, we find that, although human performance on some tasks is worse in diffuse light, this can be largely accounted for by intrinsic task difficulty. These findings suggest that human vision is attuned to the diffuseness levels of natural lighting conditions.
A central puzzle in vision science is how perceptions that are routinely at odds with physical measurements of real world properties can arise from neural responses that nonetheless lead to effective behaviors. Here we argue that the solution depends on: (1) rejecting the assumption that the goal of vision is to recover, however imperfectly, properties of the world; and (2) replacing it with a paradigm in which perceptions reflect biological utility based on past experience rather than objective features of the environment. Present evidence is consistent with the conclusion that conceiving vision in wholly empirical terms provides a plausible way to understand what we see and why.
A central goal of visual neuroscience is to relate the selectivity of individual neurons to perceptual judgments, such as detection of a visual pattern at low contrast or in noise. Since neurons in early areas of visual cortex carry information only about a local patch of the image, detection of global patterns must entail spatial pooling over many such neurons. Physiological methods provide access to local detection mechanisms at the single-neuron level but do not reveal how neural responses are combined to determine the perceptual decision. Behavioral methods provide access to perceptual judgments of a global stimulus but typically do not reveal the selectivity of the individual neurons underlying detection. Here we show how the existence of a nonlinearity in spatial pooling does allow properties of these early mechanisms to be estimated from behavioral responses to global stimuli. As an example, we consider detection of large-field sinusoidal gratings in noise. Based on human behavioral data, we estimate the length and width tuning of the local detection mechanisms and show that it is roughly consistent with the tuning of individual neurons in primary visual cortex of primate. We also show that a local energy model of pooling based on these estimated receptive fields is much more predictive of human judgments than competing models, such as probability summation. In addition to revealing underlying properties of early detection and spatial integration mechanisms in human cortex, our findings open a window on new methods for relating system-level perceptual judgments to neuron-level processing.
Shape is a defining feature of objects. Yet, no image-computable model accurately predicts how similar or different shapes appear to human observers. To address this, we developed a model ('ShapeComp'), based on over 100 shape features (e.g., area, compactness, Fourier descriptors). When trained to capture the variance in a database of >25,000 animal silhouettes, ShapeComp predicts human shape similarity judgments almost perfectly (r 2 >0.99) without fitting any parameters to human data. To test the model, we created carefully selected arrays of complex novel shapes using a Generative Adversarial Network trained on the animal silhouettes, which we presented to observers in a wide range of tasks. Our findings show that human shape perception is inherently multidimensional and optimized for comparing natural shapes. ShapeComp outperforms conventional metrics, and can also be used to generate perceptually uniform stimulus sets, making it a powerful tool for investigating shape and object representations in the human brain.
Shape is a defining feature of objects, and human observers can effortlessly compare shapes to determine how similar they are. Yet, to date, no image-computable model can predict how visually similar or different shapes appear. Such a model would be an invaluable tool for neuroscientists and could provide insights into computations underlying human shape perception. To address this need, we developed a model (‘ShapeComp’), based on over 100 shape features (e.g., area, compactness, Fourier descriptors). When trained to capture the variance in a database of >25,000 animal silhouettes, ShapeComp accurately predicts human shape similarity judgments between pairs of shapes without fitting any parameters to human data. To test the model, we created carefully selected arrays of complex novel shapes using a Generative Adversarial Network trained on the animal silhouettes, which we presented to observers in a wide range of tasks. Our findings show that incorporating multiple ShapeComp dimensions facilitates the prediction of human shape similarity across a small number of shapes, and also captures much of the variance in the multiple arrangements of many shapes. ShapeComp outperforms both conventional pixel-based metrics and state-of-the-art convolutional neural networks, and can also be used to generate perceptually uniform stimulus sets, making it a powerful tool for investigating shape and object representations in the human brain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.