Joint Semantic Segmentation and 3D Reconstruction from Monocular Video

Kundu, Anjan; Li, Yin; Dellaert, Frank; Li, Fuxin; Rehg, James M.

doi:10.1007/978-3-319-10599-4_45

Cited by 198 publications

(172 citation statements)

References 28 publications

Supporting

Mentioning

171

Contrasting

Unclassified

Order By: Relevance

“…These methods have also been combined with deep networks [2,20]. For the 7 subsets of the KITTI dataset used in this paper [9,13,14,18,19,22,25], deep learning has never been used to tackle the semantic segmentation step. For example, [14] shows how to jointly classify pixels and predict their depth using a multi-class decision stumps-based boosted classifier.…”

Section: Related Workmentioning

confidence: 99%

“…For example, in the KITTI dataset (see Fig. 2 where all labels are reported) the class Tree of the dataset from He et al [9] is correlated with the class Vegetation from the dataset labeled by Kundu et al [13]. A plain softmax, optimizing the probability of the Tree class will implicitly penalize the probability of Vegetation, which is not a desired effect.…”

Section: Proposed Approachmentioning

confidence: 99%

“…Most datasets for scene parsing contain only several hundreds of images, some of them only several dozen [6,15,9,13,14,18,19,22,25]. Additionally, combining these datasets is a non-trivial task as target classes are often tailored to a custom application.…”

Section: Introductionmentioning

confidence: 99%

“…One good example of such label-set diversity can be found within the KITTI Vision benchmark [4] which contains outdoor scene videos. Many research teams work on this dataset since its release in 2013, tackling computer vision tasks such as visual odometry, 3D object detection and 3D tracking [9,13,14,18,19,22,25]. To tackle these tasks, several research teams have labeled parts of the original dataset, independently from the other teams and often for different goals (among the works listed above, semantic segmentation is the final goal only for [14] and [25]).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Semantic Segmentation via Multi-task, Multi-domain Learning

Fourure

Emonet

Fromont

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We present an approach that leverages multiple datasets possibly annotated using different classes to improve the semantic segmentation accuracy on each individual dataset. We propose a new selective loss function that can be integrated into deep networks to exploit training data coming from multiple datasets with possibly different tasks (e.g., different label-sets). We show how the gradient-reversal approach for domain adaptation can be used in this setup. Thorought experiments on semantic segmentation applications show the relevance of our approach.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Proposed Approachmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Semantic Segmentation via Multi-task, Multi-domain Learning

Fourure

Emonet

Fromont

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…While impressive results can be achieved with multi-view and videobased approaches [1][2][3][4], the progress of depth sensors and their decreasing prices make them an attractive alternative, able to capture 3D in a single shot [5]. Unfortunately, even the best depth sensors still provide imperfect measurements.…”

Section: Introductionmentioning

confidence: 99%

Building Scene Models by Completing and Hallucinating Depth and Semantics

Liu

Salzmann

2016

Computer Vision – ECCV 2016

View full text Add to dashboard Cite

Abstract. Building 3D scene models has been a longstanding goal of computer vision. The great progress in depth sensors brings us one step closer to achieving this in a single shot. However, depth sensors still produce imperfect measurements that are sparse and contain holes. While depth completion aims at tackling this issue, it ignores the fact that some regions of the scene are occluded by the foreground objects. Building a scene model would therefore require to hallucinate the depth behind these objects. In contrast with existing methods that either rely on manual input, or focus on the indoor scenario, we introduce a fully-automatic method to jointly complete and hallucinate depth and semantics in challenging outdoor scenes. To this end, we develop a two-layer model representing both the visible information and the hidden one. At the heart of our approach lies a formulation based on the Mumford-Shah functional, for which we derive an effective optimization strategy. Our experiments evidence that our approach can accurately fill the large holes in the input depth maps, segment the different kinds of objects in the scene, and hallucinate the depth and semantics behind the foreground objects.

show abstract

Radiation search operations using scene understanding with autonomous UAV and UGV

Christie

Shoemaker

Kochersberger

et al. 2017

Journal of Field Robotics

View full text Add to dashboard Cite

Autonomously searching for hazardous radiation sources requires the ability of the aerial and ground systems to understand the scene they are scouting. In this paper, we present systems, algorithms, and experiments to perform radiation search using unmanned aerial vehicles (UAV) and unmanned ground vehicles (UGV) by employing semantic scene segmentation. The aerial data is used to identify radiological points of interest, generate an orthophoto along with a digital elevation model (DEM) of the scene, and perform semantic segmentation to assign a category (e.g. road, grass) to each pixel in the orthophoto. We perform semantic segmentation by training a model on a dataset of images we collected and annotated, using the model to perform inference on images of the test area unseen to the model, and then refining the results with the DEM to better reason about category predictions at each pixel. We then use all of these outputs to plan a path for a UGV carrying a LiDAR to map the environment and avoid obstacles not present during the flight, and a radiation detector to collect more precise radiation measurements from the ground. Results of the analysis for each scenario tested favorably. We also note that our approach is general and has the potential to work for a variety of different sensing tasks.

show abstract

Joint Semantic Segmentation and 3D Reconstruction from Monocular Video

Cited by 198 publications

References 28 publications

Semantic Segmentation via Multi-task, Multi-domain Learning

Semantic Segmentation via Multi-task, Multi-domain Learning

Building Scene Models by Completing and Hallucinating Depth and Semantics

Radiation search operations using scene understanding with autonomous UAV and UGV

Contact Info

Product

Resources

About