2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.75
|View full text |Cite
|
Sign up to set email alerts
|

Image-Based Localization Using LSTMs for Structured Feature Correlation

Abstract: In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance. We provide extensive quantitative comparison of CNN-based and SIFT-based localiz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

2
436
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 448 publications
(438 citation statements)
references
References 63 publications
2
436
0
Order By: Relevance
“…The resulting matches are then used for RANSACbased camera pose estimation [26]. Machine learningbased approaches either replace the 2D-3D matching stage through scene coordinate regression [10,12,16,[52][53][54]79], i.e., they regress the 3D point coordinate in each 2D-3D match, or directly regress the camera pose from an image [8,13,35,36,89]. The former type of methods achieves state-of-the-art localization accuracy in smallscale scenes [12,16,53], but do not seem to easily scale to larger scenes [12].…”
Section: Related Workmentioning
confidence: 99%
“…The resulting matches are then used for RANSACbased camera pose estimation [26]. Machine learningbased approaches either replace the 2D-3D matching stage through scene coordinate regression [10,12,16,[52][53][54]79], i.e., they regress the 3D point coordinate in each 2D-3D match, or directly regress the camera pose from an image [8,13,35,36,89]. The former type of methods achieves state-of-the-art localization accuracy in smallscale scenes [12,16,53], but do not seem to easily scale to larger scenes [12].…”
Section: Related Workmentioning
confidence: 99%
“…In order to improve the accuracy of PoseNet, several variants have been proposed in recent papers. For example, LSTM-Pose [33] makes use of LSTM units [10] on the CNN output to exploit the structured feature correlation. The LSTM units play the role of a structured dimensionality reduction on the feature vector and lead to drastic improvements in localization performance.…”
Section: Related Workmentioning
confidence: 99%
“…However, the feasibility of pose regression with CNNs is shown in earlier works [6]. Enhanced accuracies in the task of estimating poses were derived by further improvement [27] using Long Short-Term Memory layers (LSTM) [28], a type of recurrent neural net which was combined with CNNs in the past. LSTMs handle the problem of a dissolving gradient during the back-propagation using so-called gates.…”
Section: Related Workmentioning
confidence: 99%