We study the inference of rigid three-dimensional interpretations for the structure and motion of four or more moving points from but two orthographic views of the points. We develop an algorithm to determine whether image data are compatible with a rigid interpretation. As a corollary of this result we find that the measure of false targets (roughly, nonrigid objects that appear rigid) is zero. We find that if the two views have at least one rigid interpretation, then in fact there is a canonical one-parameter family of rigid interpretations; we show how to compute this family, and we describe precisely how the rigid interpretations vary within it. Since only two views are used, this analysis is relevant also to stereo vision.
Mathematical analyses of motion perception have established minimum combinations of points and distinct views that are sufficient to recover three-dimensional (3D) structure from two-dimensional (2D) images, using such regularities as rigid motion, fixed axis of rotation, and constant angular velocity. To determine whether human subjects could recover 3D information at these theoretical levels, vie presented subjects with pairs of displays and asked them to determine whether they represented the same or different 3D structures. Number of points was varied between two and five; number of views was varied between two and six; and the motion was fixed axis with constant angular velocity, fixed axis with variable velocity, or variable axis with variable velocity. Accuracy increased with views, decreased with points, and was greater with fixed-axis motion. Subjects performed above chance levels even when motion was eliminated, indicating that they exploited regularities in addition to those in the theoretical analyses.Theoretical investigations of visual motion have provided a number of specific analyses of the minimum number of points and views required to recover three-dimensional (3D) structure from two-dimensional (2D) images. Recovery of 3D structure, in this context, is denned as determining the x, y, and z coordinates of each point, up to a scale factor. These analyses differ in the constraints that are imposed. UUman (1979) showed that under a rigidity constraint, three views of four noncoplanar points are sufficient to recover structure in an orthographic projection, up to a reflection about the frontal plane. The required numbers of points and views are reduced by adding further constraints, such as planarity (Hoffman & Flinchbaugh, 1982), fixed axis of rotation (Hoffman & Bennett, 1986;Webb & Aggarwal, 1981), and constant angular velocity (Hoffman & Bennett, 1985). These proofs are summarized in Table 1.A number of empirical studies have addressed issues related to theoretical analyses of the recovery of structure from motion. Several studies (e.g., Braunstein & Andersen, 1986;Schwartz & Sperling, 1983;Todd, 1985) have questioned the generality of the rigidity constraint. Other studies have considered the recovery of structure with small numbers of views or with small numbers of points. Lappin, Doner, and Kotlas (1980) found that subjects could make accurate judgments based on 3D structure This research was supported by a contract to D. Hoffman from the Office of Naval Research, Cognitive and Neural Sciences Division, Perceptual Sciences Group. We thank Joseph Lappin and James Todd for helpful comments on an earlier version of this article and Johnna Eastbum and James Tittle for assistance in various aspects of this research.
We show that four orthographic projections of two rigidly linked points are compatible with at most four interpretations of the relative three-dimensional positions of the points if the points rotate about a fixed axis--even when the points as a system undergo arbitrary rigid translations. A fifth view (projection) yields a unique interpretation and makes zero the probability that randomly chosen image points will receive a three-dimensional interpretation. Assuming that the points rotate at a constant angular velocity, instead of adding a fifth view, also yields a unique interpretation and makes zero the probability that randomly chosen image points will receive a three-dimensional interpretation.
We explore a method of representing solid shape that is useful for visual recognition. We assume that complex shapes are constructed from convex, compact shapes and that construction involves three operations: solid union (to form humps), solid subtraction (to leave dents), and smoothing (to remove discontinuities). The boundaries between shapes joined through these operations are contours of extrema of a principal curvature. Complex objects can be decomposed along these boundaries into convex shapes, the so-called parts. We suggest that this decomposition into parts forms the basis for a shape memory. We show that the part boundaries of an object can be inferred from its occluding contours, at least up to a number of ambiguities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.