2017 IEEE International Conference on Robotics and Automation (ICRA) 2017
DOI: 10.1109/icra.2017.7989236
|View full text |Cite
|
Sign up to set email alerts
|

DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks

Abstract: This paper studies monocular visual odometry (VO) problem. Most of existing VO algorithms are developed under a standard pipeline including feature extraction, feature matching, motion estimation, local optimisation, etc. Although some of them have demonstrated superior performance, they usually need to be carefully designed and specifically fine-tuned to work well in different environments. Some prior knowledge is also required to recover an absolute scale for monocular VO. This paper presents a novel end-to-… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
589
0
13

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

2
8

Authors

Journals

citations
Cited by 687 publications
(603 citation statements)
references
References 27 publications
1
589
0
13
Order By: Relevance
“…Sequence 10 Model t rel r rel t rel r rel Two-stream [16] 0.0554 0.0830 0.0870 0.1592 ResNet18 [10] 0.1094 0.0602 0.1443 0.1327 DeepVO [20] 0.2157 0.0709 0.2153 0.3311 PointNet [17] 0.0946 0.0442 0.1381 0.1360 PointGrid [12] 0.0550 0.0690 0.0842 0.1523 DeepPCO (Ours) 0.0263 0.0305 0.0247 0.0659…”
Section: Sequence 04mentioning
confidence: 99%
“…Sequence 10 Model t rel r rel t rel r rel Two-stream [16] 0.0554 0.0830 0.0870 0.1592 ResNet18 [10] 0.1094 0.0602 0.1443 0.1327 DeepVO [20] 0.2157 0.0709 0.2153 0.3311 PointNet [17] 0.0946 0.0442 0.1381 0.1360 PointGrid [12] 0.0550 0.0690 0.0842 0.1523 DeepPCO (Ours) 0.0263 0.0305 0.0247 0.0659…”
Section: Sequence 04mentioning
confidence: 99%
“…The key concept of our architecture is inspired by [5], which uses an LSTM for action recognition in videos. A similar approach was also used in [32] to predict full 6DOF camera pose. In contrast to the previous approaches, we also evaluate a bidirectional LSTM version, see Figure 3).…”
Section: Layermentioning
confidence: 99%
“…We aim at utilizing multiple observations from a sequence to reduce the ambiguity of a single image. Since images are high-dimension data with much redundant information, learning information from raw data is ineffective [34]. On the other hand, LSTMs cannot preserve longterm knowledge [29].…”
Section: Content-augmented Pose Estimationmentioning
confidence: 99%