We used anti-correlated stimuli to compare the correspondence problem in stereo and motion. Subjects performed a two-interval forced-choice disparity/motion direction discrimination task for different displacements. For anti-correlated 1d band-pass noise, we found weak reversed depth and motion. With 2d anti-correlated stimuli, stereo performance was impaired, but the perception of reversed motion was enhanced. We can explain the main features of our data in terms of channels tuned to different spatial frequencies and orientation. We suggest that a key difference between the solution of the correspondence problem by the motion and stereo systems concerns the integration of information at different orientations.
Two-frame random-element kinematograms were used to study the matching algorithm employed by the visual system to keep track of moving elements. Previous data have shown that the maximum spatial displacement detectable (dmax) for random-dot kinematogram stimuli increases both with increasing dot size and with decreasing centre frequency for spatially band-pass kinematograms. Both of these findings could be explained by either (i) a matching algorithm sensitive to the number of false targets in the display (informational limit) or (ii) spatial-frequency tuned sensors hardwired for detecting displacements of a constant proportion of their preferred frequency (phase-based limit). The present experiment was designed to differentiate between these alternative explanations. The stimuli were band-pass filtered (difference-of-Gaussian) random-dot patterns. The combination of six dot densities and three filter sizes produced 18 experimental conditions and allowed independent control of the spectral content and filtered-element density of the stimuli. When the dot density was high, dmax was larger for the coarse-filtered stimuli, as predicted by both theories. There was also a critical dot density for each filter size, above which dmax was constant but below which dmax rose sharply. This critical density was higher for fine-filtered stimuli such that at the lowest dot density of 0.025%, dmax was constant for all filter sizes. In support of the informational limit model, dmax was found to be directly proportional to the two-dimensional spacing of filtered elements. In contrast, dmax varied from 0.6 to 8.5 cycles of the stimulus peak frequency, suggesting that a phase-based model of motion detection cannot account for the results.
Two-frame random-dot kinematograms (RDKs) of different dot density, area and contrast were used to study the spatial properties of the human visual motion system. It was found that the maximum spatial displacement at which observers could reliably discriminate the direction of motion (dmax) increased gradually by a factor of up to 6.4 as dot density was decreased from 50 to 0.025% for high Michelson contrast (0.997) stimuli. As stimulus area was reduced from 645 deg2, this trend gradually disappeared so that by a stimulus area of 2.56 deg2, there was no effect of density upon dmax. A further experiment investigated the effects of reducing Michelson contrast from 0.77 to 0.2 on dmax over this same range of dot densities. It was found that at the highest densities, dmax declined as contrast was reduced. Furthermore, for contrasts at and below 0.4, dmax was invariant of density over the range 50-5%. These results can be accounted for by the fact that both reducing contrast, while keeping density fixed, and reducing density, while maintaining a fixed high contrast, reduce the stimulus mean luminance. For all contrasts, decreasing density below 5% led to an increase in dmax. However, the rate of this increase was slower for the lower contrast stimuli. A two-stage model based on bandpass filtering followed by an informationally limited motion detection stage is proposed and shown to provide a good account of these data.
Previous work [Prince, S. J. D, & Eagle, R. A. (1999). Size-disparity correlation in human binocular depth perception. Proceedings of the Royal Society: Biological Sciences, 266, 1361-1365] has demonstrated that disparity sign discrimination performance in isolated bandpass patterns is supported at disparities much larger than a phase disparity model might predict. One possibility is that this extended performance relies on a separate second-order system [Hess, R. F., & Wilcox, L. M. (1994). Linear and non-linear filtering in stereopsis. Vision Research, 34, 2431-2438]. Here, a 'weighted directional energy' model is developed which explains a large body of crossed versus uncrossed disparity discrimination data with a single mechanism. This model assumes a population of binocular complex cells at every image point with a range of position disparity shifts. These cells sample a local energy function which is weighted so that energy at large disparities is relatively attenuated. Disparity sign is determined by summing and comparing energy at crossed and uncrossed disparities in the presence of noise. The model qualitatively predicts matching data for one-dimensional Gabor stimuli. This scheme also predicts DMax in Gabor stimuli and filtered noise. Moreover, a range of 'non-linear' phenomena, in which disparity is perceived from contrast envelope information alone, can be explained. The weighted directional energy model presents a biologically plausible, parsimonious explanation of matching behaviour in bandpass stimuli for both 'first-order' and 'second-order' stimuli which obviates the need for multiple mechanisms in stereo correspondence.
Although binocular disparity and motion parallax are powerful cues for depth, neither, in isolation, can specify information about both object size and depth. It has been shown that information from both cues can be combined to specify the size, depth, and distance of an object in a scene (Richards, 1985 Journal of the Optical Society of America A 2 343-349). Experiments are reported in which natural viewing and physical stimuli have been used to investigate the nature of size and depth perception on the basis of disparity and parallax presented separately and together at a range of viewing distances. Observers adjusted the relative position of three bright LEDs, which were constrained to form a triangle in plan view with the apex pointing toward the observer, so its dimensions matched that of a standard held by the subject. With static monocular viewing, depth settings were inaccurate and erratic. When both cues were present together accuracy increased and the perceptual outcome was consistent with an averaging of the information provided by both cues. When an apparent bias evident in the observers' responses (the tendency to under-estimate the size of the standard) was taken into account, accuracy was high and size and depth constancy were close to 100%. In addition, given this assumption, the same estimate of viewing distance was used to scale size and depth estimates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.