A new formulation of the image partitioning problem is presented: construct a complete and stable description of an image--in terms of a specified descriptive language--that is simplest in the sense of being shortest. We show that a descriptive language limited to a low-order polynomial description of the intensity variation within each region and a chain-code-like description of the region boundaries yields intuitively satisfying partitions for a wide class of images.The advantage of this formulation is that it can be extended to deal with subsequent steps of the image understanding problem (or to deal with other attributes, such as texture) in a natural way by augmenting the descriptive language. Experiments performed on a variety of both real and synthetic images demonstrate the superior performance of this approach over partitioning techniques based on clustering vectors of local image attributes and standard edge-detection techniques.
Received AbstractLine drawings provide an effective means of communication about the geometry of 3D objects. An understanding of how to duplicate the way humans interpret line drawings is extremely important in enabling man-machine communication with respect to images, diagrams, and spatial constructs. In particular, such an understanding could be used to provide the human with the capability to create a line-drawing sketch of a polyhedral object that the machine can automatically convert into the intended 3D model.A recently published paper (Marill 1991) presented a simple optimization procedure supposedly able to duplicate human judgment in recovering the 3D "wire frame" geometry of objects depicted in line drawings. Marill provided some impressive examples, but no theoretical justification for his approach. Here, we introduce our own work by first critically examining Marill's algorithm. We provide an explanation for why Marill's algorithm was able to perform as well as it did on the examples he presented, discuss its weaknesses, and show very simple examples where it fails. We then provide an algorithm that improves on Marill's results. In particular, we show that an effective objective function must favor both symmetry and planarity--Marill deals only with the symmetry issue. By modifying Marill's objective function to explicitly favor planar-faced solutions, and by using a more competent optimization technique,~we were able to demonstrate significantly improved performance in all of the examples Marill provided and those additional ones we constructed ourselves. Finally, we examine some questions relevant to the implications of this work for understanding the human ability to interpret line drawings.
The colors, textures, and shapes of shadows are physically constrained in several ways in natural scenes. The visual system appears to ignore these constraints, however, and to accept many patterns as shadows even though they could not occur naturally. In the stimuli that we have studied, the only requirements for the perception of depth due to shadows were that shadow regions be darker than the surrounding, nonshadow regions and that there be consistent contrast polarity along the shadow border. Three-dimensional shape due to shadows was perceived when shadow areas were filled with colors or textures that could not occur in natural scenes, when shadow and nonshadow regions had textures that moved in different directions, or when they were presented on different depth planes. The results suggest that the interpretation of shadows begins with the identification of acceptable shadow borders by a cooperative process that requires consistent contrast polarity across a range of scales at each point along the border. Finally, we discuss how the identification of a shadow region can help the visual system to patch together areas that are separated by shadow boundaries, to identify directions of surface curvature, and to select a preferred three-dimensional interpretation while rejecting others.
Abstract. Our goal is to reconstruct both the shape and reflectance properties of surfaces from multiple images. We argue that an object-centered representation is most appropriate for this purpose because it naturally accommodates multiple sources of data, multiple images (including motion sequences of a rigid object), and self-occlusions. We then present a specific object-centered reconstruction method and its implementation. The method begins with an initial estimate of surface shape provided, for example, by triangulating the result of conventional stereo. The surface shape and reflectance properties are then iteratively adjusted to minimize an objective function that combines information from multiple input images. The objective function is a weighted sum of stereo, shading, and smoothness components, where the weight varies over the surface. For example, the stereo component is weighted more strongly where the surface projects onto highly textured areas in the images, and less strongly otherwise. Thus, each component has its greatest influence where its accuracy is likely to be greatest. Experimental results on both synthetic and real images are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.