Describing Visual Scenes Using Transformed Objects and Parts

Sudderth, Erik B.; Torralba, Antonio; Freeman, William T.; Willsky, Alan S.

doi:10.1007/s11263-007-0069-5

Cited by 172 publications

(161 citation statements)

References 51 publications

Supporting

Mentioning

156

Contrasting

Order By: Relevance

“…This work has been extended to handle also spatial information [24] as well as part notions in infinite mixture models [18] and motion [12]. Non of these models have presented a video segmentation prior or described a generative model for appearance classes across multiple videos.…”

Section: Related Workmentioning

confidence: 99%

Multi-class Video Co-segmentation with a Generative Multi-video Model

Chiu

Fritz

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition

105

View full text Add to dashboard Cite

Video data provides a rich source of information that is available to us today in large quantities e.g. from online resources. Tasks like segmentation benefit greatly from the analysis of spatio-temporal motion patterns in videos and recent advances in video segmentation has shown great progress in exploiting these addition cues. However, observing a single video is often not enough to predict meaningful segmentations and inference across videos becomes necessary in order to predict segmentations that are consistent with objects classes. Therefore the task of video cosegmentation is being proposed, that aims at inferring segmentation from multiple videos. But current approaches are limited to only considering binary foreground/background segmentation and multiple videos of the same object. This is a clear mismatch to the challenges that we are facing with videos from online resources or consumer videos.We propose to study multi-class video co-segmentation where the number of object classes is unknown as well as the number of instances in each frame and video. We achieve this by formulating a non-parametric bayesian model across videos sequences that is based on a new videos segmentation prior as well as a global appearance model that links segments of the same class. We present the first multi-class video co-segmentation evaluation. We show that our method is applicable to real video data from online resources and outperforms state-of-the-art video segmentation and image co-segmentation baselines.

show abstract

Section: Related Workmentioning

confidence: 99%

Multi-class Video Co-segmentation with a Generative Multi-video Model

Chiu

Fritz

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition

105

View full text Add to dashboard Cite

show abstract

“…These models borrow closely from models used in natural language processing, and express structural and appearance variation as the result of production rules. Hierarchical models for objects that include scene-level constraints have been presented in Singhal et al (2003), Sudderth et al (2005), which are very similar in spirit to our model. The contextual constraints, however, tend to strictly be relative position constraints.…”

Section: Related Workmentioning

confidence: 99%

A Hierarchical and Contextual Model for Aerial Image Parsing

2009

View full text Add to dashboard Cite

In this paper we present a hierarchical and contextual model for aerial image understanding. Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into hierarchical groups whose appearances and configurations are determined by statistical constraints (e.g. relative position, relative scale, etc.). Our hierarchy is a nonrecursive grammar for objects in aerial images comprised of layers of nodes that can each decompose into a number of different configurations. This allows us to generate and recognize a vast number of scenes with relatively few rules. We present a minimax entropy framework for learning the statistical constraints between objects and show that this learned context allows us to rule out unlikely scene configurations and hallucinate undetected objects during inference. A similar algorithm was proposed for texture synthesis (Zhu et al. in Int. J. Comput. Vis. 2:107-126, 1998) age according to our learned prior model. The C4 algorithm can quickly and efficiently switch between alternate competing sub-solutions, for example whether an image patch is better explained by a parking lot with cars or by a building with vents. We also show that our model can predict the locations of objects our detectors missed. We conclude by presenting parsed aerial images and experimental results showing that our cluster sampling and top-down prediction algorithms use the learned contextual cues from our model to improve detection results over traditional bottom-up detectors alone.

show abstract

“…One side pursues an exact reconstruction of the image, starting for instance with image segmentation and continuing with grouping operations (Marr, 1982;Witkin and Tenenbaum, 1983;Malik et al, 2001;Elder et al, 2003;Tu and Zhu, 2006). The aim of such approaches is to systematically extract scene information, which eventually leads to categorization, but actual transformations of structure have been pursued to a limited extent only (see (Sudderth et al, 2008) for image transformations for object detection). The other side attempts to avoid any elaborate reconstruction by preprocessing the image with 'simple' features or single transformations, whose output is then classified or matched (Oliva and Torralba, 2001;Renninger and Malik, 2004;Mori et al, 2005).…”

Section: Further Comparison To Other Approachesmentioning

confidence: 99%

An Approach to the Parameterization of Structure for Fast Categorization

Rasche

2009

Int J Comput Vis

View full text Add to dashboard Cite

A decomposition is described, which parameterizes the geometry and appearance of contours and regions of gray-scale images with the goal of fast categorization. To express the contour geometry, a contour is transformed into a local/global space, from which parameters are derived classifying its global geometry (arc, inflexion or alternating) and describing its local aspects (degree of curvature, edginess, symmetry). Regions are parameterized based on their symmetric axes, which are evolved with a wave-propagation process enabling to generate the distance map for fragmented contour images. The methodology is evaluated on three image sets, the Caltech 101 set and two sets drawn from the Corel collection. The performance nearly reaches the one of other categorization systems for unsupervised learning.

show abstract

Describing Visual Scenes Using Transformed Objects and Parts

Cited by 172 publications

References 51 publications

Multi-class Video Co-segmentation with a Generative Multi-video Model

Multi-class Video Co-segmentation with a Generative Multi-video Model

A Hierarchical and Contextual Model for Aerial Image Parsing

An Approach to the Parameterization of Structure for Fast Categorization

Contact Info

Product

Resources

About