Image-Based Localization Using LSTMs for Structured Feature Correlation

Walch, Florian; Hazırbaş, Caner; Leal-Taixé, Laura; Sattler, Torsten; Hilsenbeck, Sebastian; Cremers, Daniel

doi:10.1109/iccv.2017.75

Cited by 448 publications

(438 citation statements)

References 63 publications

Supporting

Mentioning

436

Contrasting

Order By: Relevance

“…The resulting matches are then used for RANSACbased camera pose estimation [26]. Machine learningbased approaches either replace the 2D-3D matching stage through scene coordinate regression [10,12,16,[52][53][54]79], i.e., they regress the 3D point coordinate in each 2D-3D match, or directly regress the camera pose from an image [8,13,35,36,89]. The former type of methods achieves state-of-the-art localization accuracy in smallscale scenes [12,16,53], but do not seem to easily scale to larger scenes [12].…”

Section: Related Workmentioning

confidence: 99%

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

Larsson

Stenborg

Toft

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Long-term visual localization is the problem of estimating the camera pose of a given query image in a scene whose appearance changes over time. It is an important problem in practice, for example, encountered in autonomous driving. In order to gain robustness to such changes, long-term localization approaches often use segmantic segmentations as an invariant scene representation, as the semantic meaning of each scene part should not be affected by seasonal and other changes. However, these representations are typically not very discriminative due to the limited number of available classes. In this paper, we propose a new neural network, the Fine-Grained Segmentation Network (FGSN), that can be used to provide image segmentations with a larger number of labels and can be trained in a self-supervised fashion. In addition, we show how FGSNs can be trained to output consistent labels across seasonal changes. We demonstrate through extensive experiments that integrating the fine-grained segmentations produced by our FGSNs into existing localization algorithms leads to substantial improvements in localization performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

Larsson

Stenborg

Toft

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…In order to improve the accuracy of PoseNet, several variants have been proposed in recent papers. For example, LSTM-Pose [33] makes use of LSTM units [10] on the CNN output to exploit the structured feature correlation. The LSTM units play the role of a structured dimensionality reduction on the feature vector and lead to drastic improvements in localization performance.…”

Section: Related Workmentioning

confidence: 99%

Full-Frame Scene Coordinate Regression for Image-Based Localization

Li¹,

Ylioinas²,

Kannala³

2018

Robotics: Science and Systems XIV

View full text Add to dashboard Cite

Abstract-Image-based localization, or camera relocalization, is a fundamental problem in computer vision and robotics, and it refers to estimating camera pose from an image. Recent state-ofthe-art approaches use learning based methods, such as Random Forests (RFs) and Convolutional Neural Networks (CNNs), to regress for each pixel in the image its corresponding position in the scene's world coordinate frame, and solve the final pose via a RANSAC-based optimization scheme using the predicted correspondences. In this paper, instead of in a patch-based manner, we propose to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and, more importantly, to add more global context to the regression process to improve the robustness. To do so, we adopt a fully convolutional encoder-decoder neural network architecture which accepts a whole image as input and produces scene coordinate predictions for all pixels in the image. However, using more global context is prone to overfitting. To alleviate this issue, we propose to use data augmentation to generate more data for training. In addition to the data augmentation in 2D image space, we also augment the data in 3D space. We evaluate our approach on the publicly available 7-Scenes dataset, and experiments show that it has better scene coordinate predictions and achieves state-of-the-art results in localization with improved robustness on the hardest frames (e.g., frames with repeated structures).

show abstract

“…However, the feasibility of pose regression with CNNs is shown in earlier works [6]. Enhanced accuracies in the task of estimating poses were derived by further improvement [27] using Long Short-Term Memory layers (LSTM) [28], a type of recurrent neural net which was combined with CNNs in the past. LSTMs handle the problem of a dissolving gradient during the back-propagation using so-called gates.…”

Section: Related Workmentioning

confidence: 99%

UAS Navigation with SqueezePoseNet—Accuracy Boosting for Pose Regression by Data Augmentation

Mueller

Jutzi

2018

Drones

View full text Add to dashboard Cite

Abstract:The navigation of Unmanned Aerial Vehicles (UAVs) nowadays is mostly based on Global Navigation Satellite Systems (GNSSs). Drawbacks of satellite-based navigation are failures caused by occlusions or multi-path interferences. Therefore, alternative methods have been developed in recent years. Visual navigation methods such as Visual Odometry (VO) or visual Simultaneous Localization and Mapping (SLAM) aid global navigation solutions by closing trajectory gaps or performing loop closures. However, if the trajectory estimation is interrupted or not available, a re-localization is mandatory. Furthermore, the latest research has shown promising results on pose regression in 6 Degrees of Freedom (DoF) based on Convolutional Neural Networks (CNNs). Additionally, existing navigation methods can benefit from these networks. In this article, a method for GNSS-free and fast image-based pose regression by utilizing a small Convolutional Neural Network is presented. Therefore, a small CNN (SqueezePoseNet) is utilized, transfer learning is applied and the network is tuned for pose regression. Furthermore, recent drawbacks are overcome by applying data augmentation on a training dataset utilizing simulated images. Experiments with small CNNs show promising results for GNSS-free and fast localization compared to larger networks. By training a CNN with an extended data set including simulated images, the accuracy on pose regression is improved up to 61.7% for position and up to 76.0% for rotation compared to training on a standard not-augmented data set.

show abstract

Image-Based Localization Using LSTMs for Structured Feature Correlation

Cited by 448 publications

References 63 publications

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

Full-Frame Scene Coordinate Regression for Image-Based Localization

UAS Navigation with SqueezePoseNet—Accuracy Boosting for Pose Regression by Data Augmentation

Contact Info

Product

Resources

About