Long-Term Visual Localization Revisited

Toft, Carl; Maddern, Will; Torii, Akihiko; Hammarstrand, Lars; Stenborg, Erik; Safari, Daniel; Okutomi, Masatoshi; Pollefeys, Marc; Šivic, Josef; Pajdla, Tomáš; Kahl, Fredrik; Sattler, Torsten

doi:10.1109/tpami.2020.3032010

Cited by 123 publications

(60 citation statements)

References 96 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dataset. We use the CMU Extended Seasons dataset [2,54]. This dataset is collected using car-mounted cameras, producing images with a significant amount of radial distortion.…”

Section: A1 Pixloc On Cmu Seasonsmentioning

confidence: 99%

Rigidity Preserving Image Transformations and Equivariance in Perspective

Brynte¹,

Bökman²,

Flinth³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We characterize the class of image plane transformations which realize rigid camera motions and call these transformations 'rigidity preserving'. In particular, 2D translations of pinhole images are not rigidity preserving. Hence, when using CNNs for 3D inference tasks, it can be beneficial to modify the inductive bias from equivariance towards translations to equivariance towards rigidity preserving transformations. We investigate how equivariance with respect to rigidity preserving transformations can be approximated in CNNs, and test our ideas on both 6D object pose estimation and visual localization. Experimentally, we improve on several competitive baselines.

show abstract

“…Dataset. We use the CMU Extended Seasons dataset [2,54]. This dataset is collected using car-mounted cameras, producing images with a significant amount of radial distortion.…”

Section: A1 Pixloc On Cmu Seasonsmentioning

confidence: 99%

Rigidity Preserving Image Transformations and Equivariance in Perspective

Brynte¹,

Bökman²,

Flinth³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…SIFT is a handcrafted feature detection algorithm whose robustness to scale changes is widely recognized. ASLFeat is a learningbased detect-and-describe feature extraction method which performs well on several broadly-used benchmarks [4], [50], [52]. GIFT-SP is a learning-based detect-then-describe feature extraction method, which uses Superpoint as the keypoint detector and uses GIFT as the descriptor and is robust to scale changes.…”

Section: Image Matchingmentioning

confidence: 99%

“…E STABLISHING pixel-level correspondences between two images is an essential basis for a wide range of computer vision tasks such as structure from motion [1], [2], augmented reality [3], visual localization [4], and Simultaneous localization and mapping (SLAM) [5]. Such correspondences are usually estimated by sparse local feature extraction and matching [6]- [16].…”

Section: Introductionmentioning

confidence: 99%

Scale-Net: Learning to Reduce Scale Differences for Large-Scale Invariant Image Matching

Fu¹,

Wu²

2021

Preprint

View full text Add to dashboard Cite

Most image matching methods perform poorly when encountering large scale changes in images. To solve this problem, firstly, we propose a scale-difference-aware image matching method (SDAIM) that reduces image scale differences before local feature extraction, via resizing both images of an image pair according to an estimated scale ratio. Secondly, in order to accurately estimate the scale ratio, we propose a covisibilityattention-reinforced matching module (CVARM) and then design a novel neural network, termed as Scale-Net, based on CVARM. The proposed CVARM can lay more stress on covisible areas within the image pair and suppress the distraction from those areas visible in only one image. Quantitative and qualitative experiments confirm that the proposed Scale-Net has higher scale ratio estimation accuracy and much better generalization ability compared with all the existing scale ratio estimation methods. Further experiments on image matching and relative pose estimation tasks demonstrate that our SDAIM and Scale-Net are able to greatly boost the performance of representative local features and state-of-the-art local feature matching methods.

show abstract

“…Specifically, the description network and detection network are combined into a single CNN. By combining the detection and description steps for joint optimization, the joint describe-then-detect pipeline achieves better performance than the detect-then-describe pipeline, especially under challenging conditions [42], [16]. However, these methods are fully supervised and rely on dense groundtruth correspondence labels for training.…”

Section: Introductionmentioning

confidence: 99%

Decoupling Makes Weakly Supervised Local Feature Better

Li¹,

Wang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Weakly supervised learning can help local feature methods to overcome the obstacle of acquiring a large-scale dataset with densely labeled correspondences. However, since weak supervision cannot distinguish the losses caused by the detection and description steps, directly conducting weakly supervised learning within a joint describe-then-detect pipeline suffers limited performance. In this paper, we propose a decoupled describe-then-detect pipeline tailored for weakly supervised local feature learning. Within our pipeline, the detection step is decoupled from the description step and postponed until discriminative and robust descriptors are learned. In addition, we introduce a line-to-window search strategy to explicitly use the camera pose information for better descriptor learning. Extensive experiments show that our method, namely PoSFeat (Camera Pose Supervised Feature), outperforms previous fully and weakly supervised methods and achieves state-of-the-art performance on a wide range of downstream tasks.

show abstract

Long-Term Visual Localization Revisited

Cited by 123 publications

References 96 publications

Rigidity Preserving Image Transformations and Equivariance in Perspective

Rigidity Preserving Image Transformations and Equivariance in Perspective

Scale-Net: Learning to Reduce Scale Differences for Large-Scale Invariant Image Matching

Decoupling Makes Weakly Supervised Local Feature Better

Contact Info

Product

Resources

About