2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793479
|View full text |Cite
|
Sign up to set email alerts
|

Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Abstract: Accurate relative pose is one of the key components in visual odometry (VO) and simultaneous localization and mapping (SLAM). Recently, the self-supervised learning framework that jointly optimizes the relative pose and target image depth has attracted the attention of the community. Previous works rely on the photometric error generated from depths and poses between adjacent frames, which contains large systematic error under realistic scenes due to reflective surfaces and occlusions. In this paper, we bridge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
44
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 69 publications
(47 citation statements)
references
References 48 publications
2
44
0
1
Order By: Relevance
“…For different scenes with different sensors, these methods are difficult to transfer because sensors should be photometrically recalibrated, and also correct uncertainty map formation for matching points is required. Modern enhancements of these approaches are neural network methods that train in a self-supervised manner -D3VO [35], Deep-MatchVO [36], DF-VO [37]. All of them allow generating pose estimation of two neighbor frames a monocular camera and depth map.…”
Section: Visual-based Robot Localizationmentioning
confidence: 99%
“…For different scenes with different sensors, these methods are difficult to transfer because sensors should be photometrically recalibrated, and also correct uncertainty map formation for matching points is required. Modern enhancements of these approaches are neural network methods that train in a self-supervised manner -D3VO [35], Deep-MatchVO [36], DF-VO [37]. All of them allow generating pose estimation of two neighbor frames a monocular camera and depth map.…”
Section: Visual-based Robot Localizationmentioning
confidence: 99%
“…where is the threshold factor that controls the selecting capability of the mask, we set it to 0.45 in our experimental settings. We also strengthen both masks by filtering out the cross-frame static pixels and non-principal area [22,40], which are commonly used in several state-of-the-art methods [22,26,40,52].…”
Section: Combined Selective Maskmentioning
confidence: 99%
“…The training of depth and ego-motion networks are entangled during self-supervision [52,74]. Thus, enhancing the ego-motion estimation by extra supervision is also beneficial for depth estimation.…”
Section: Cycle Consistency Constraintmentioning
confidence: 99%
“…Previous works employ an additional branch to regress an uncertainty map, which helps a little [54]. Instead, we follow the explicit occlusion modeling approach [41] which does not rely on the data-driven uncertainty. We have observed that photometric inconsistency such as moving objects usually incurs larger photometric errors (Figure 4(b)).…”
Section: Differentiable Sparse Feature Selectionmentioning
confidence: 99%