2016 Fourth International Conference on 3D Vision (3DV) 2016
DOI: 10.1109/3dv.2016.69
|View full text |Cite
|
Sign up to set email alerts
|

Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks

Abstract: Multi-scale deep CNNs have been used successfully for problems mapping each pixel to a label, such as depth estimation and semantic segmentation. It has also been shown that such architectures are reusable and can be used for multiple tasks. These networks are typically trained independently for each task by varying the output layer(s) and training objective. In this work we present a new model for simultaneous depth estimation and semantic segmentation from a single RGB image. Our approach demonstrates the fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
119
0
1

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 136 publications
(121 citation statements)
references
References 24 publications
1
119
0
1
Order By: Relevance
“…Ladicky et al [36] combine depth regression with semantic classification deploying a bag-of-visual-words model and a boosting algorithm. Mousavian et al [37] deploy a multi-scale CNN to estimate depth and used within a CRF to obtain semantic segmentation. Wang et al [38] use local and global CNNs to extract pixels and regions potential which are fed to a CRF.…”
Section: Related Workmentioning
confidence: 99%
“…Ladicky et al [36] combine depth regression with semantic classification deploying a bag-of-visual-words model and a boosting algorithm. Mousavian et al [37] deploy a multi-scale CNN to estimate depth and used within a CRF to obtain semantic segmentation. Wang et al [38] use local and global CNNs to extract pixels and regions potential which are fed to a CRF.…”
Section: Related Workmentioning
confidence: 99%
“…SSeg: The output of the segmenter defined by Mousavian et al [29] that is trained on NYU V2 dataset [30]. The resulting representation is a H × W × C sseg mask stack where C sseg is the number of categories in the NYU V2 dataset.…”
Section: B Visual Representationsmentioning
confidence: 99%
“…Then the gap between real and simulated data is related to the quality of the segmentation and detector. This setup is particularly timely as research on object detection [28], semantic segmentation [35] and depth estimation [29] has been propelled by deep learning methods with a variety of high performing models available. We show empirically that by using simulation with these representations we achieve substantial improvement without any domain adaptation.…”
Section: B Visual Representationsmentioning
confidence: 99%
“…Several researchers have recently attempted this demanding challenge, i.e., building a model that can simultaneously learn multiple tasks with different outputs. Mousavian et al [310] undertook joint people detection in tandem with re-identification, while Van Ranst et al [311] tackled image segmentation with depth estimation. However, more exploration and investigation to overcome this challenge is needed.…”
Section: G Other Challengesmentioning
confidence: 99%