Cosegmentation is typically defined as the task of jointly segmenting "something similar" in a given set of images. Existing methods are too generic and so far have not demonstrated competitive results for any specific task. In this paper we overcome this limitation by adding two new aspects to cosegmentation: (1) the "something" has to be an object, and (2) the "similarity" measure is learned. In this way, we are able to achieve excellent results on the recently introduced iCoseg dataset, which contains small sets of images of either the same object instance or similar objects of the same class. The challenge of this dataset lies in the extreme changes in viewpoint, lighting, and object deformations within each set. We are able to considerably outperform several competitors. To achieve this performance, we borrow recent ideas from object recognition: the use of powerful features extracted from a pool of candidate objectlike segmentations. We believe that our work will be beneficial to several application areas, such as image retrieval.
We address the problem of populating object category detection datasets with dense, per-object 3D reconstructions, bootstrapped from class labels, ground truth figureground segmentations and a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion, then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions on one of the most challenging existing object-category detection datasets, PASCAL VOC. Our results may re-stimulate once popular geometry-oriented model-based recognition approaches.
Abstract. The problem of cosegmentation consists of segmenting the same object (or objects of the same class) in two or more distinct images. Recently a number of different models have been proposed for this problem. However, no comparison of such models and corresponding optimization techniques has been done so far. We analyze three existing models: the L1 norm model of Rother et al. [1], the L2 norm model of Mukherjee et al. [2] and the "reward" model of Hochbaum and Singh [3]. We also study a new model, which is a straightforward extension of the Boykov-Jolly model for single image segmentation [4]. In terms of optimization, we use a Dual Decomposition (DD) technique in addition to optimization methods in [1,2]. Experiments show a significant improvement of DD over published methods. Our main conclusion, however, is that the new model is the best overall because it: (i) has fewest parameters; (ii) is most robust in practice, and (iii) can be optimized well with an efficient EM-style procedure.
Many interactive image segmentation approaches use an objective function which includes appearance models as an unknown variable. Since the resulting optimization problem is NP-hard the segmentation and appearance are typically optimized separately, in an EM-style fashion. One contribution of this paper is to express the objective function purely in terms of the unknown segmentation, using higher-order cliques. This formulation reveals an interesting bias of the model towards balanced segmentations. Furthermore, it enables us to develop a new dual decomposition optimization procedure, which provides additionally a lower bound. Hence, we are able to improve on existing optimizers, and verify that for a considerable number of real world examples we even achieve global optimality. This is important since we are able, for the first time, to analyze the deficiencies of the model. Another contribution is to establish a property of a particular dual decomposition approach which involves convex functions depending on foreground area. As a consequence, we show that the optimal decomposition for our problem can be computed efficiently via a parametric maxflow algorithm.
Abstract. In this paper, we exploit an inextensibility prior as a way to better constrain the highly ambiguous problem of non-rigid reconstruction from monocular views. While this widely applicable prior has been used before combined with the strong assumption of a known 3D-template, our work achieves template-free reconstruction using only inextensibility constraints. We show how to formulate an energy function that includes soft inextensibility constraints and rely on existing discrete optimisation methods to minimise it. Our method has all of the following advantages: (i) it can be applied to two tasks that have been so far considered independently -template based reconstruction and non-rigid structure from motion -producing comparable or better results than the state-of-the art methods; (ii) it can perform template-free reconstruction from as few as two images; and (iii) it does not require post-processing stitching or surface smoothing.
This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices [15]. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation.We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.
Abstract-Reconstructing the shape of a deformable object from a single image is a challenging problem, even when a 3D template shape is available. Many different methods have been proposed for this problem, however what they have in common is that they are only able to reconstruct the part of the surface which is visible in a reference image.In contrast, we are interested in recovering the full shape of a deformable 3D object. We introduce a new method designed to reconstruct closed surfaces. This type of surface is better suited for representing objects with volume.Our method relies on recent advances in silhouette based reconstruction methods to obtain the template from a reference image. This template is then deformed in order to fit the measurements of a new input image. We combine an inextensibility prior on the deformation with powerful image measurements, in the form of silhouette and area constraints, to make our method less reliant on point correspondences.We show reconstruction results for different object classes, such as animals or hands, that have not been previously attempted with existing template methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.