Analyzing modular CNN architectures for joint depth prediction and semantic segmentation

Jafari, Omid Hosseini; Groth, Oliver; Kirillov, Alexander; Yang, Michael Ying; Rother, Carsten

doi:10.1109/icra.2017.7989537

Cited by 53 publications

(35 citation statements)

References 36 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, [47] built a hierarchical CRF with CNN to leverage the geometric cue, and [22] proposed a crosstask uncertainty. There are other works proposed to jointly learn the two tasks with various techniques, including finetuning [33], cross-modality influence [19], task distillation module with intermediate auxiliary tasks [48], recursive estimation [51], task attention loss [20]. More broadly speaking, the idea of jointly learning semantic segmentation and depth estimation can be connected to multi-task learning [23], where multiple outputs are produced by a single network.…”

Section: Related Workmentioning

confidence: 99%

Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach

Chen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

193

113

View full text Add to dashboard Cite

Recently, increasing attention has been drawn to training semantic segmentation models using synthetic data and computer-generated annotation. However, domain gap remains a major barrier and prevents models learned from synthetic data from generalizing well to real-world applications. In this work, we take the advantage of additional geometric information from synthetic data, a powerful yet largely neglected cue, to bridge the domain gap. Such geometric information can be generated easily from synthetic data, and is proven to be closely coupled with semantic information. With the geometric information, we propose a model to reduce domain shift on two levels: on the input level, we augment the traditional image translation network with the additional geometric information to translate synthetic images into realistic styles; on the output level, we build a task network which simultaneously performs depth estimation and semantic segmentation on the synthetic data. Meanwhile, we encourage the network to preserve the correlation between depth and semantics by adversarial training on the output space. We then validate our method on two pairs of synthetic to real dataset: Virtual KITTI→KITTI, and SYNTHIA→Cityscapes, where we achieve a significant performance gain compared to the non-adaptive baseline and methods without using geometric information. This demonstrates the usefulness of geometric information from synthetic data for cross-domain semantic segmentation.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach

Chen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

193

113

View full text Add to dashboard Cite

show abstract

“…Starting from the work of Long et al [19], fully convolutional encoder-decoder networks have been a staple in semantic segmentation. Although we do not address semantic segmentation, we leverage per-pixel semantic labeling enabled by existing systems to aid depth prediction in the form of providing class-specific priors and an attention mechanism to selectively apply such priors, which is different from joint segmentation and depth prediction approaches [10].…”

Section: Related Workmentioning

confidence: 99%

Geo-Supervised Visual Depth Prediction

Fei

Wong

Soatto

2019

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We propose using global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual 3D reconstruction. We test the effect of using the resulting prior in depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it. Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of gravity as a supervisory signal.

show abstract

“…[24] also adopted a single neural network to do semantic labeling, depth prediction and surface normal estimation. In work [25], the authors analyzed the cross-modality influences between semantic segmentation and depth prediction and then designed a network architecture to balance the crossmodality influences and achieve improved results. Despite the good performance these methods achieved, multi-step training process is still required, that leads to heavy computational load in learning and using these models.…”

Section: Related Workmentioning

confidence: 99%

GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment

Lin

Lee

et al. 2019

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We study the problem of learning a generalizable action policy for an intelligent agent to actively approach an object of interest in an indoor environment solely from its visual inputs. While scene-driven or recognition-driven visual navigation has been widely studied, prior efforts suffer severely from the limited generalization capability. In this paper, we first argue the object searching task is environment dependent while the approaching ability is general. To learn a generalizable approaching policy, we present a novel solution dubbed as GAPLE which adopts two channels of visual features: depth and semantic segmentation, as the inputs to the policy learning module. The empirical studies conducted on the House3D dataset as well as on a physical platform in a real world scenario validate our hypothesis, and we further provide indepth qualitative analysis.neering,

show abstract

Analyzing modular CNN architectures for joint depth prediction and semantic segmentation

Cited by 53 publications

References 36 publications

Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach

Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach

Geo-Supervised Visual Depth Prediction

GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment

Contact Info

Product

Resources

About