2022
DOI: 10.1109/tpami.2020.3032010
|View full text |Cite
|
Sign up to set email alerts
|

Long-Term Visual Localization Revisited

Abstract: Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing conditions, including day-night changes, as well as weather and seasonal variations, while providing highly accurate six degree-of-freedom (6DOF) camera pose estimates. In this paper, we extend three publicly available datasets containing images captured under a wide variety of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
54
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 123 publications
(60 citation statements)
references
References 96 publications
0
54
0
Order By: Relevance
“…Dataset. We use the CMU Extended Seasons dataset [2,54]. This dataset is collected using car-mounted cameras, producing images with a significant amount of radial distortion.…”
Section: A1 Pixloc On Cmu Seasonsmentioning
confidence: 99%
“…Dataset. We use the CMU Extended Seasons dataset [2,54]. This dataset is collected using car-mounted cameras, producing images with a significant amount of radial distortion.…”
Section: A1 Pixloc On Cmu Seasonsmentioning
confidence: 99%
“…SIFT is a handcrafted feature detection algorithm whose robustness to scale changes is widely recognized. ASLFeat is a learningbased detect-and-describe feature extraction method which performs well on several broadly-used benchmarks [4], [50], [52]. GIFT-SP is a learning-based detect-then-describe feature extraction method, which uses Superpoint as the keypoint detector and uses GIFT as the descriptor and is robust to scale changes.…”
Section: Image Matchingmentioning
confidence: 99%
“…E STABLISHING pixel-level correspondences between two images is an essential basis for a wide range of computer vision tasks such as structure from motion [1], [2], augmented reality [3], visual localization [4], and Simultaneous localization and mapping (SLAM) [5]. Such correspondences are usually estimated by sparse local feature extraction and matching [6]- [16].…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, the description network and detection network are combined into a single CNN. By combining the detection and description steps for joint optimization, the joint describe-then-detect pipeline achieves better performance than the detect-then-describe pipeline, especially under challenging conditions [42], [16]. However, these methods are fully supervised and rely on dense groundtruth correspondence labels for training.…”
Section: Introductionmentioning
confidence: 99%