One of the ways that we perceive shape is through seeing motion. Visual motion may be actively generated (for example, in locomotion), or passively observed. In the study of the perception of three-dimensional structure from motion, the non-moving, passive observer in an environment of moving rigid objects has been used as a substitute for an active observer moving in an environment of stationary objects; this 'rigidity hypothesis' has played a central role in computational and experimental studies of structure from motion. Here we show that this is not an adequate substitution because active and passive observers can perceive three-dimensional structure differently, despite experiencing the same visual stimulus: active observers' perception of three-dimensional structure depends on extraretinal information about their own movements. The visual system thus treats objects that are stationary (in an allocentric, earth-fixed reference frame) differently from objects that are merely rigid. These results show that action makes an important contribution to depth perception, and argue for a revision of the rigidity hypothesis to incorporate the special case of stationary objects.
Having long considered that extraretinal information plays little or no role in spatial vision, the study of structure from motion (SfM) has confounded a moving observer perceiving a stationary object with a non-moving observer perceiving a rigid object undergoing equal and opposite motion. However, recently it has been shown that extraretinal information does play an important role in the extraction of structure from motion by enhancing motion cues for objects that are stationary in an allocentric, world-fixed reference frame (Nature 409 (2001) 85). Here, we test whether stationarity per se is a criterion in SfM by pitting it against rigidity. We have created stimuli that, for a moving observer, offer two interpretations: one that is rigid but non-stationary, another that is more stationary or less rigid. In two experiments, with subjects reporting either structure or motion, we show that stationary, non-rigid solutions are preferred over rigid, non-stationary solutions; and that when no perfectly stationary solutions is available, the visual system prefers the solution that is most stationary. These results demonstrate that allocentric criteria, derived from extra-retinal information, participate in reconstructing the visual scene.
Local motion detectors can only provide the velocity component perpendicular to a moving line that crosses their receptive field, leading to an ambiguity known as the "aperture problem". This problem is solved exactly for rigid objects translating in the screen plane via the intersection of constraints (IOC). In natural scenes, however, object motions are not restricted to fronto-parallel translations, and several objects with distinct motions may be present in the visual space. Under these conditions the usual IOC construction is no longer valid, which raises questions as its use as a basis for spatial integration and selection of motion signals in uniform and non-uniform velocity fields. The influence of the motion of random dots on the perceived direction of a horizontal line grating was measured, when dots and lines are seen through different apertures. The random dots were mapped on a plane that translates in a fronto-parallel plane (uniform 2D translation) or in depth (3D, corresponding to a non-uniform projected velocity field, either expanding or contracting). The grating was either moving rigidly with the dots or in the opposite direction. Subjects' responses show that the direction of line grating movement was reliably influenced only in conditions consistent with rigid motion; where there was a reliable influence, the perceived direction was consistent with the dot motion pattern. This finding points to the existence of a motion-based selection mechanism that operates prior to the disambiguation of the line movement direction. Disambiguation could occur for both uniform and non-uniform velocity fields, even though in the last case none of the individual dots indicated the proper direction in 2D velocity space. Finally, the capture by non-uniform motion patterns was less robust than that by uniform 2D translations, and could be disrupted by manipulations of the shape and size of the apertures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.