2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01200
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Abstract: Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
98
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 94 publications
(105 citation statements)
references
References 53 publications
0
98
0
Order By: Relevance
“…Li et al [52] improved on this initial effort by enforcing multi-view and photometric consistency throughout training. In a follow-up work, Li et al [54] introduce a joint classification-regression network architecture for predicting scene coordinate, and demonstrate the effectiveness of training data augmentation for large improvements on standard benchmarks.…”
Section: Scene Coordinate Regressionmentioning
confidence: 99%
See 2 more Smart Citations
“…Li et al [52] improved on this initial effort by enforcing multi-view and photometric consistency throughout training. In a follow-up work, Li et al [54] introduce a joint classification-regression network architecture for predicting scene coordinate, and demonstrate the effectiveness of training data augmentation for large improvements on standard benchmarks.…”
Section: Scene Coordinate Regressionmentioning
confidence: 99%
“…We convert input images to grayscale and re-scale them to 480px height. For training, we follow Li et al [54] and apply data augmentation. We apply random adjustments of brightness and contrast of the input image within a ±10% range.…”
Section: Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2019, they proposed an improved version named ngdsac [18], which optimizes arbitrary tasks loss during training process by allowing the network to learn the probability of each scene coordinate at the same time, thereby improving accuracy. Li et al proposed to use a hierarchical scene coordinate network [48] and predict pixel scene coordinates in a coarseto-fine manner, and significantly reduce the performance gap with feature matching methods. In general, although these methods achieve high accuracy in small scenes, they are still not scalable enough for large scenes due to the limitations of space projection methods.…”
Section: Space Reprojection Based Camera Localizationmentioning
confidence: 99%
“…In the case of Kendall et al ( 2015 ), Walch et al ( 2017 ), and more recently Li et al ( 2019 ), the output can be as complex as a 6 degree of freedom pose regression respective to the original “map” of images. What all of these systems have in common, however, is that the domain against which they can match novel images is fixed at the point at which they are trained.…”
Section: Related Workmentioning
confidence: 99%