What, Where and How Many? Combining Object Detectors and CRFs

Ladický, Ľubor; Sturgess, Paul; Alahari, Karteek; Russell, Chris; Torr, Philip H. S.

doi:10.1007/978-3-642-15561-1_31

Cited by 241 publications

(269 citation statements)

References 36 publications

(96 reference statements)

Supporting

Mentioning

264

Contrasting

Order By: Relevance

“…However, we train and test on only the 11 classes shown in Fig. 2 as common practice when evaluating on CamVid [7,11,23]. For our framework, we consider cars, sign-symbol, pedestrian, column-pole, and bicyclist as the object classes and building, tree, sky, road, fence, and sidewalk as the background classes.…”

Section: Methodsmentioning

confidence: 99%

“…vertical, horizontal, sky) [8], and a model for camera viewpoint [9], which can provide powerful constraints on the plausible locations of objects in the scene [10]. [11] takes advantage of powerful sliding window detectors [12,13] to facilitate the labeling of objects (e.g. cars, persons) as they project to smaller regions in the image than background classes (e.g.…”

Section: Related Workmentioning

confidence: 99%

“…building, road, or sky), an issue our approach also addresses. Incorporating multiple cues facilitates [5,11,8,9] the joint optimization of coupled problems (e.g. semantic labels and depth estimates).…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Taylor

Ayvaci

Ravichandran

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We describe an approach to incorporate scene topology and semantics into pixel-level object detection and localization. Our method requires video to determine occlusion regions and thence local depth ordering, and any visual recognition scheme that provides a score at local image regions, for instance object detection probabilities. We set up a cost functional that incorporates occlusion cues induced by object boundaries, label consistency and recognition priors, and solve it using a convex optimization scheme. We show that our method improves localization accuracy of existing recognition approaches, or equivalently provides semantic labels to pixel-level localization and segmentation.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Taylor

Ayvaci

Ravichandran

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Kohli et al [15] formulate Robust P N clique potentials. The technique was used by Ladicky et al [16] to incorporate sliding window object detections, and extended into a multilevel hierarchical manner by Ladicky et al [17]. Using clique potentials comes at a cost of run-time, and solving for optimal parameters remains an open problem.…”

Section: Introductionmentioning

confidence: 99%

Sensor fusion for semantic segmentation of urban scenes

Zhang

Candra

Vetter

et al. 2015

2015 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Abstract-Semantic understanding of environments is an important problem in robotics in general and intelligent autonomous systems in particular. In this paper, we propose a semantic segmentation algorithm which effectively fuses information from images and 3D point clouds. The proposed method incorporates information from multiple scales in an intuitive and effective manner. A late-fusion architecture is proposed to maximally leverage the training data in each modality. Finally, a pairwise Conditional Random Field (CRF) is used as a postprocessing step to enforce spatial consistency in the structured prediction. The proposed algorithm is evaluated on the publicly available KITTI dataset [1] [2], augmented with additional pixel and point-wise semantic labels for building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence regions. A per-pixel accuracy of 89.3% and average class accuracy of 65.4% is achieved, well above current state-of-the-art [3].

show abstract

“…Recent works, mainly related to scene understanding from images (e.g., [12][13][14]) have demonstrated how the application of segmentation processes for recognition tasks yields promising results, both in terms of object class recognition and correcting the segmentation of the searched objects. Such approaches are categorized as joint categorization and segmentation (JCaS).…”

Section: Introductionmentioning

confidence: 99%

Extraction of Objects from Terrestrial Laser Scans by Integrating Geometry Image and Intensity Data with Demonstration on Trees

Barnea

Filin

2012

Remote Sensing

View full text Add to dashboard Cite

Abstract:Terrestrial laser scanning is becoming a standard for 3D modeling of complex scenes. Results of the scan contain detailed geometric information about the scene; however, the lack of semantic details still constitutes a gap in ensuring this data is usable for mapping. This paper proposes a framework for recognition of objects in laser scans; aiming to utilize all the available information, range, intensity and color information integrated into the extraction framework. Instead of using the 3D point cloud, which is complex to process since it lacks an inherent neighborhood structure, we propose a polar representation which facilitates low-level image processing tasks, e.g., segmentation and texture modeling. Using attributes of each segment, a feature space analysis is used to classify segments into objects. This process is followed by a fine-tuning stage based on graph-cut algorithm, which considers the 3D nature of the data. The proposed algorithm is demonstrated on tree extraction and tested on scans containing complex objects in addition to trees. Results show a very high detection level and thereby the feasibility of the proposed framework.

show abstract

What, Where and How Many? Combining Object Detectors and CRFs

Cited by 241 publications

References 36 publications

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Sensor fusion for semantic segmentation of urban scenes

Extraction of Objects from Terrestrial Laser Scans by Integrating Geometry Image and Intensity Data with Demonstration on Trees

Contact Info

Product

Resources

About