2017 IEEE International Conference on Robotics and Automation (ICRA) 2017
DOI: 10.1109/icra.2017.7989537
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing modular CNN architectures for joint depth prediction and semantic segmentation

Abstract: Abstract-This paper addresses the task of designing a modular neural network architecture that jointly solves different tasks. As an example we use the tasks of depth estimation and semantic segmentation given a single RGB image. The main focus of this work is to analyze the cross-modality influence between depth and semantic prediction maps on their joint refinement. While most previous works solely focus on measuring improvements in accuracy, we propose a way to quantify the cross-modality influence. We show… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
34
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 53 publications
(35 citation statements)
references
References 36 publications
(60 reference statements)
1
34
0
Order By: Relevance
“…For instance, [47] built a hierarchical CRF with CNN to leverage the geometric cue, and [22] proposed a crosstask uncertainty. There are other works proposed to jointly learn the two tasks with various techniques, including finetuning [33], cross-modality influence [19], task distillation module with intermediate auxiliary tasks [48], recursive estimation [51], task attention loss [20]. More broadly speaking, the idea of jointly learning semantic segmentation and depth estimation can be connected to multi-task learning [23], where multiple outputs are produced by a single network.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, [47] built a hierarchical CRF with CNN to leverage the geometric cue, and [22] proposed a crosstask uncertainty. There are other works proposed to jointly learn the two tasks with various techniques, including finetuning [33], cross-modality influence [19], task distillation module with intermediate auxiliary tasks [48], recursive estimation [51], task attention loss [20]. More broadly speaking, the idea of jointly learning semantic segmentation and depth estimation can be connected to multi-task learning [23], where multiple outputs are produced by a single network.…”
Section: Related Workmentioning
confidence: 99%
“…Starting from the work of Long et al [19], fully convolutional encoder-decoder networks have been a staple in semantic segmentation. Although we do not address semantic segmentation, we leverage per-pixel semantic labeling enabled by existing systems to aid depth prediction in the form of providing class-specific priors and an attention mechanism to selectively apply such priors, which is different from joint segmentation and depth prediction approaches [10].…”
Section: Related Workmentioning
confidence: 99%
“…[24] also adopted a single neural network to do semantic labeling, depth prediction and surface normal estimation. In work [25], the authors analyzed the cross-modality influences between semantic segmentation and depth prediction and then designed a network architecture to balance the crossmodality influences and achieve improved results. Despite the good performance these methods achieved, multi-step training process is still required, that leads to heavy computational load in learning and using these models.…”
Section: Related Workmentioning
confidence: 99%