2014
DOI: 10.1007/978-3-319-10599-4_45
|View full text |Cite
|
Sign up to set email alerts
|

Joint Semantic Segmentation and 3D Reconstruction from Monocular Video

Abstract: Abstract. We present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the sem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
171
0
1

Year Published

2016
2016
2018
2018

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 198 publications
(172 citation statements)
references
References 28 publications
0
171
0
1
Order By: Relevance
“…These methods have also been combined with deep networks [2,20]. For the 7 subsets of the KITTI dataset used in this paper [9,13,14,18,19,22,25], deep learning has never been used to tackle the semantic segmentation step. For example, [14] shows how to jointly classify pixels and predict their depth using a multi-class decision stumps-based boosted classifier.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…These methods have also been combined with deep networks [2,20]. For the 7 subsets of the KITTI dataset used in this paper [9,13,14,18,19,22,25], deep learning has never been used to tackle the semantic segmentation step. For example, [14] shows how to jointly classify pixels and predict their depth using a multi-class decision stumps-based boosted classifier.…”
Section: Related Workmentioning
confidence: 99%
“…For example, in the KITTI dataset (see Fig. 2 where all labels are reported) the class Tree of the dataset from He et al [9] is correlated with the class Vegetation from the dataset labeled by Kundu et al [13]. A plain softmax, optimizing the probability of the Tree class will implicitly penalize the probability of Vegetation, which is not a desired effect.…”
Section: Proposed Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…While impressive results can be achieved with multi-view and videobased approaches [1][2][3][4], the progress of depth sensors and their decreasing prices make them an attractive alternative, able to capture 3D in a single shot [5]. Unfortunately, even the best depth sensors still provide imperfect measurements.…”
Section: Introductionmentioning
confidence: 99%