2022 International Conference on Robotics and Automation (ICRA) 2022
DOI: 10.1109/icra46639.2022.9811842
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth

Abstract: Self-supervised monocular scene flow estimation, aiming to understand both 3D structures and 3D motions from two temporally consecutive monocular images, has received increasing attention for its simple and economical sensor setup. However, the accuracy of current methods suffers from the bottleneck of less-efficient network architecture and lack of motion rigidity for regularization. In this paper, we propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 83 publications
0
8
0
Order By: Relevance
“…Compared with VINS-Mono [23], the advantage of our proposed approach not only exists in fusion stage but also in front-end feature extraction which we mentioned in Section II.A. Besides, the improvement compared with EMA-VIO [1] which also deploys Transformer-based approach for fusion, is possibly that the multi-layer fusion module aggregates the LiDAR and inertial data at different scale [32] [31].…”
Section: Positioning Results On Kitti Datasetmentioning
confidence: 99%
See 3 more Smart Citations
“…Compared with VINS-Mono [23], the advantage of our proposed approach not only exists in fusion stage but also in front-end feature extraction which we mentioned in Section II.A. Besides, the improvement compared with EMA-VIO [1] which also deploys Transformer-based approach for fusion, is possibly that the multi-layer fusion module aggregates the LiDAR and inertial data at different scale [32] [31].…”
Section: Positioning Results On Kitti Datasetmentioning
confidence: 99%
“…The number of multi-head is set to 4. Most of empirical hyperparameters are referenced from other related works, such as MLF-VO [32] and Transfuser [48].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Song et al (2021) used deep learning to obtain prior models of objects, combining object-level information with Visual SLAM and estimating object and camera poses through the extended Kalman filter. Jiang et al (2022) estimated relative pose by fusing depth maps and RGB maps, demonstrating the feasibility of this fusion approach using the KITTI data set. Zhou et al (2022a, 2022b) introduced local plane constraints to Visual SLAM, assuming local regions of flat ground, which finds applications in the field of autonomous driving.…”
Section: Some Of the Latest Research In Visual Simultaneous Localizat...mentioning
confidence: 90%