2021
DOI: 10.1109/tpami.2019.2941876
|View full text |Cite
|
Sign up to set email alerts
|

Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?

Abstract: Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6 degree-of-freedom (DOF) pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
70
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 49 publications
(71 citation statements)
references
References 89 publications
1
70
0
Order By: Relevance
“…It is important to note here that image matching can also be included as a sub-module of a VPR system. Torii et al (2019) demonstrated that such a system can achieve accurate localisation without the need for large-scale 3D-models.…”
Section: Literature Reviewmentioning
confidence: 99%
See 2 more Smart Citations
“…It is important to note here that image matching can also be included as a sub-module of a VPR system. Torii et al (2019) demonstrated that such a system can achieve accurate localisation without the need for large-scale 3D-models.…”
Section: Literature Reviewmentioning
confidence: 99%
“…McManus et al (2014) extract scene signatures from an image by utilising some a priori environment information and describe them using HOG-descriptors. DenseVLAD presented by Torii et al (2015) is a Vector-of-Locally-Aggregated-Descriptorsbased approach using densely sampled SIFT keypoints, which has been shown to perform similar to deep-learningbased techniques in Sattler et al (2018) and Torii et al (2019). A more recent usage of traditional handcrafted feature descriptors for VPR was presented in CoHOG (Zaffar et al 2020) which focuses on entropy-rich regions in an image and uses HOG as the regional descriptor for convolutionalregional matching.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…One problem with using 3D models is that they are expensive to store and maintain, and do not scale well to large environments. To overcome this problem [214] proposes to not use a large 3D model created offline, but rather to create a small local model online (SfMon-the-fly), using the top retrieved images from the database. This solution is shown to achieve good results, although its effectiveness deteriorates in the case of weakly textured scenes.…”
Section: F Using 3d Modelsmentioning
confidence: 99%
“…A first broad categorization of these datasets comes from the distinction between robotics and non-robotics datasets. Robotics datasets [21], [45], [151], [152], [214], [240], [241], [247]- [249], [249] are typically created by recording videos from cameras mounted on a vehicle (e.g., car) or a smaller robot. The data is thus available in sequences, with the temporal coherence among successive frames that can be leveraged to formulate motion hypotheses.…”
Section: A Datasetsmentioning
confidence: 99%