Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Li, Xiaotian; Wang, Shuzhe; Zhao, Yi; Verbeek, Jakob; Kannala, Juho

doi:10.1109/cvpr42600.2020.01200

Cited by 94 publications

(105 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Li et al [52] improved on this initial effort by enforcing multi-view and photometric consistency throughout training. In a follow-up work, Li et al [54] introduce a joint classification-regression network architecture for predicting scene coordinate, and demonstrate the effectiveness of training data augmentation for large improvements on standard benchmarks.…”

Section: Scene Coordinate Regressionmentioning

confidence: 99%

“…We convert input images to grayscale and re-scale them to 480px height. For training, we follow Li et al [54] and apply data augmentation. We apply random adjustments of brightness and contrast of the input image within a ±10% range.…”

Section: Setupmentioning

confidence: 99%

“…We randomly re-scale images within 66% and 150%, and adjust the focal length accordingly. Different from Li et al [54], we do not shear training images, since our simple pinhole camera model does not support this operation. We also do not shift images.…”

Section: Setupmentioning

confidence: 99%

See 2 more Smart Citations

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Brachmann¹,

Rother

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

We describe a learning-based system that estimates the camera position and orientation from a single input image relative to a known environment. The system is flexible w.r.t. the amount of information available at test and at training time, catering to different applications. Input images can be RGB-D or RGB, and a 3D model of the environment can be utilized for training but is not necessary. In the minimal case, our system requires only RGB images and ground truth poses at training time, and it requires only a single RGB image at test time. The framework consists of a deep neural network and fully differentiable pose optimization. The neural network predicts so called scene coordinates, i.e. dense correspondences between the input image and 3D scene space of the environment. The pose optimization implements robust fitting of pose parameters using differentiable RANSAC (DSAC) to facilitate end-to-end training. The system, an extension of DSAC++ and referred to as DSAC*, achieves state-of-the-art accuracy on various public datasets for RGB-based re-localization, and competitive accuracy for RGB-D based re-localization.

show abstract

Section: Scene Coordinate Regressionmentioning

confidence: 99%

Section: Setupmentioning

confidence: 99%

See 1 more Smart Citation

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Brachmann¹,

Rother

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…In 2019, they proposed an improved version named ngdsac [18], which optimizes arbitrary tasks loss during training process by allowing the network to learn the probability of each scene coordinate at the same time, thereby improving accuracy. Li et al proposed to use a hierarchical scene coordinate network [48] and predict pixel scene coordinates in a coarseto-fine manner, and significantly reduce the performance gap with feature matching methods. In general, although these methods achieve high accuracy in small scenes, they are still not scalable enough for large scenes due to the limitations of space projection methods.…”

Section: Space Reprojection Based Camera Localizationmentioning

confidence: 99%

RnR: Retrieval and Reprojection Learning Model for Camera Localization

Yang

Shi

2021

IEEE Access

View full text Add to dashboard Cite

Camera localization is an essential technique in many applications, such as robot navigation, mixed reality, and unmanned vehicle. We are committed to solving the problem of predicting the 6-DoF pose of cameras from a single color image in a given three-Dimensional (3D) environment. In this paper, we proposed a robust learning model for it. Basically, our proposed methodology consists of two steps: image retrieval and space reprojection. The former is in charge of simultaneous localization and mapping based on pre-captured reference images that rely on the correspondence between pixel points and scene coordinates; whereas the latter carries out camera calibration between the 2D image plane and the 3D scene. Given a two-Dimensional (2D) image, the initial localization is accomplished rapidly by matching a reference image using Siamese networks. More precise localization is achieved by camera calibration between the 2D image and the 3D scene using a fully convolutional network. The experimental results on the public dataset show that our model is more robust and expandable than the previous methods. At the end of this paper, we also apply the system to Unmanned Aerial Vehicle (UAV) localization and achieve good results.

show abstract

“…In the case of Kendall et al ( 2015 ), Walch et al ( 2017 ), and more recently Li et al ( 2019 ), the output can be as complex as a 6 degree of freedom pose regression respective to the original “map” of images. What all of these systems have in common, however, is that the domain against which they can match novel images is fixed at the point at which they are trained.…”

Section: Related Workmentioning

confidence: 99%

Lost in the Woods? Place Recognition for Navigation in Difficult Forest Environments

Garforth

Webb

2020

Front. Robot. AI

View full text Add to dashboard Cite

Forests present one of the most challenging environments for computer vision due to traits, such as complex texture, rapidly changing lighting, and high dynamicity. Loop closure by place recognition is a crucial part of successfully deploying robotic systems to map forests for the purpose of automating conservation. Modern CNN-based place recognition systems like NetVLAD have reported promising results, but the datasets used to train and test them are primarily of urban scenes. In this paper, we investigate how well NetVLAD generalizes to forest environments and find that it out performs state of the art loop closure approaches. Finally, integrating NetVLAD with ORBSLAM2 and evaluating on a novel forest data set, we find that, although suitable locations for loop closure can be identified, the SLAM system is unable to resolve matched places with feature correspondences. We discuss additional considerations to be addressed in future to deal with this challenging problem.

show abstract

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Cited by 94 publications

References 53 publications

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

RnR: Retrieval and Reprojection Learning Model for Camera Localization

Lost in the Woods? Place Recognition for Navigation in Difficult Forest Environments

Contact Info

Product

Resources

About