In this paper we address the problem of minimizing a large class of energy functions that occur in early vision. The major restriction is that the energy function's smoothness term must only involve pairs of pixels. We propose two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed. The first move we consider is an α-β-swap: for a pair of labels α, β, this move exchanges the labels between an arbitrary set of pixels labeled α and another arbitrary set labeled β. Our first algorithm generates a labeling such that there is no swap move that decreases the energy. The second move we consider is an α-expansion: for a label α, this move assigns an arbitrary set of pixels the label α. Our second algorithm, which requires the smoothness term to be a metric, generates a labeling such that there is no expansion move that decreases the energy. Moreover, this solution is within a known factor of the global minimum. We experimentally demonstrate the effectiveness of our approach on image restoration, stereo and motion. Energy minimization in early visionMany early vision problems require estimating some spatially varying quantity (such as intensity or disparity) from noisy measurements. Such quantities tend to be piecewise smooth; they vary smoothly at most points, but change dramatically at object boundaries. Every pixel p ∈ P must be assigned a label in some set L; for motion or stereo, the labels are disparities, while for image restoration they represent intensities. The goal is to find a labeling f that assigns each pixel p ∈ P a label f p ∈ L, where f is both piecewise smooth and consistent with the observed data.These vision problems can be naturally formulated in terms of energy minimization. In this framework, one seeks the labeling f that minimizes the energyHere E smooth measures the extent to which f is not piecewise smooth, while E data measures the disagreement between f and the observed data. Many different energy functions have been proposed in the literature. The form of E data is typicallywhere D p measures how appropriate a label is for the pixel p given the observed data. In image restoration, for example,2 , where i p is the observed intensity of the pixel p.The choice of E smooth is a critical issue, and many different functions have been proposed. For example, in standard regularization-based vision [6], E smooth makes f smooth everywhere. This leads to poor results at object boundaries. Energy functions that do not have this problem are called discontinuity-preserving. A large number of discontinuity-preserving energy functions have been proposed (see for example [7]). Geman and Geman's seminal paper [3] gave a Bayesian interpretation of many energy functions, and proposed a discontinuitypreserving energy function based on Markov Random Fields (MRF's).The major difficulty with energy minimization for early vision lies in the enormous computational costs. Typically these energy functions have many local minima (i.e., they are non-convex). Worse still...
In this paper we address the problem of minimizing a large class of energy functions that occur in early vision. The major restriction is that the energy function's smoothness term must only involve pairs of pixels. We propose two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed. The first move we consider is an α-β-swap: for a pair of labels α, β, this move exchanges the labels between an arbitrary set of pixels labeled α and another arbitrary set labeled β. Our first algorithm generates a labeling such that there is no swap move that decreases the energy. The second move we consider is an α-expansion: for a label α, this move assigns an arbitrary set of pixels the label α. Our second algorithm, which requires the smoothness term to be a metric, generates a labeling such that there is no expansion move that decreases the energy. Moreover, this solution is within a known factor of the global minimum. We experimentally demonstrate the effectiveness of our approach on image restoration, stereo and motion. Energy minimization in early visionMany early vision problems require estimating some spatially varying quantity (such as intensity or disparity) from noisy measurements. Such quantities tend to be piecewise smooth; they vary smoothly at most points, but change dramatically at object boundaries. Every pixel p ∈ P must be assigned a label in some set L; for motion or stereo, the labels are disparities, while for image restoration they represent intensities. The goal is to find a labeling f that assigns each pixel p ∈ P a label f p ∈ L, where f is both piecewise smooth and consistent with the observed data.These vision problems can be naturally formulated in terms of energy minimization. In this framework, one seeks the labeling f that minimizes the energyHere E smooth measures the extent to which f is not piecewise smooth, while E data measures the disagreement between f and the observed data. Many different energy functions have been proposed in the literature. The form of E data is typicallywhere D p measures how appropriate a label is for the pixel p given the observed data. In image restoration, for example,2 , where i p is the observed intensity of the pixel p.The choice of E smooth is a critical issue, and many different functions have been proposed. For example, in standard regularization-based vision [6], E smooth makes f smooth everywhere. This leads to poor results at object boundaries. Energy functions that do not have this problem are called discontinuity-preserving. A large number of discontinuity-preserving energy functions have been proposed (see for example [7]). Geman and Geman's seminal paper [3] gave a Bayesian interpretation of many energy functions, and proposed a discontinuitypreserving energy function based on Markov Random Fields (MRF's).The major difficulty with energy minimization for early vision lies in the enormous computational costs. Typically these energy functions have many local minima (i.e., they are non-convex). Worse still...
Abstract-Among the most exciting advances in early vision has been the development of efficient energy minimization algorithms for pixel-labeling tasks such as depth or texture computation. It has been known for decades that such problems can be elegantly expressed as Markov random fields, yet the resulting energy minimization problems have been widely viewed as intractable. Recently, algorithms such as graph cuts and loopy belief propagation (LBP) have proven to be very powerful: For example, such methods form the basis for almost all the top-performing stereo methods. However, the trade-offs among different energy minimization algorithms are still not well understood. In this paper, we describe a set of energy minimization benchmarks and use them to compare the solution quality and runtime of several common energy minimization algorithms. We investigate three promising recent methods-graph cuts, LBP, and tree-reweighted message passing-in addition to the well-known older iterated conditional mode (ICM) algorithm. Our benchmark problems are drawn from published energy functions used for stereo, image stitching, interactive segmentation, and denoising. We also provide a general-purpose software interface that allows vision researchers to easily switch between optimization methods. The benchmarks, code, images, and results are available at http://vision.middlebury.edu/MRF/.
Abstract. Many methods for object recognition, segmentation, etc., rely on tessellation of an image into "superpixels". A superpixel is an image patch which is better aligned with intensity edges than a rectangular patch. Superpixels can be extracted with any segmentation algorithm, however, most of them produce highly irregular superpixels, with widely varying sizes and shapes. A more regular space tessellation may be desired. We formulate the superpixel partitioning problem in an energy minimization framework, and optimize with graph cuts. Our energy function explicitly encourages regular superpixels. We explore variations of the basic energy, which allow a trade-off between a less regular tessellation but more accurate boundaries or better efficiency. Our advantage over previous work is computational efficiency, principled optimization, and applicability to 3D "supervoxel" segmentation. We achieve high boundary recall on 2D images and spatial coherence on video. We also show that compact superpixels improve accuracy on a simple application of salient object segmentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.