Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

Cavallari, Tommaso; Bertinetto, Luca; Mukhoti, Jishnu; Torr, Philip H. S.; Golodetz, Stuart

doi:10.1109/3dv.2019.00068

Cited by 33 publications

(26 citation statements)

References 71 publications

(288 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another recent line of work on single-image localization has focused on machine learning [7,8,10,11,14,15,37,38,57,75,86,88]. Scene coordinate regression approaches [7,8,14,15,57,75,88] train a random forest or convolutional neural network (CNN) to predict the corresponding 3D coordinate for each pixel.…”

Section: Related Workmentioning

confidence: 99%

“…The 2D-3D matches are then used for camera pose estimation, e.g., by applying a PnP solver [1,31,41,43,45,46] inside a robust estimator such as RANSAC [4,9,17,25,47,66]. These visual localization methods typically use either local image descriptors [19,22,34,52] to explicitly match 2D features to 3D scene points or use machine learning, e.g., via a random forest [15,16] or a convolutional neural network (CNN) [6,7,14], to regress the corresponding 3D scene coordinate per pixel. They build a scene representation, e.g., a 3D Structure-from-Motion (SfM) model for local features or a CNN for scene coordinate regression, from a set of reference images.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using Image Sequences for Long-Term Visual Localization

Stenborg

Sattler

Hammarstrand

2020

2020 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

Estimating the pose of a camera in a known scene, i.e., visual localization, is a core task for applications such as self-driving cars. In many scenarios, image sequences are available and existing work on combining single-image localization with odometry offers to unlock their potential for improving localization performance. Still, the largest part of the literature focuses on single-image localization and ignores the availability of sequence data. The goal of this paper is to demonstrate the potential of image sequences in challenging scenarios, e.g., under day-night or seasonal changes. Combining ideas from the literature, we describe a sequence-based localization pipeline that combines odometry with both a coarse and a fine localization module. Experiments on long-term localization datasets show that combining single-image global localization against a prebuilt map with a visual odometry / SLAM pipeline improves performance to a level where the extended CMU Seasons dataset can be considered solved. We show that SIFT features can perform on par with modern state-of-the-art features in our framework, despite being much weaker and a magnitude faster to compute. Our code is publicly available at github.com/rulllars.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Using Image Sequences for Long-Term Visual Localization

Stenborg

Sattler

Hammarstrand

2020

2020 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

show abstract

“…Active Search [43] and an indoor localization method which exploits dense correspondences [53]. Note that, in general, methods that exploit additional depth information [11,12] Figure 3. Average pose accuracy on the combined scenes.…”

Section: Results On 7-scenes 12-scenes and Cambridgementioning

confidence: 99%

“…Instead of learning the en-tire pipeline, scene coordinate regression methods learn the first stage of the pipeline in the structure-based approaches. Namely, either a random forest [4,12,13,20,30,32,33,50,57] or a neural network [3,5,6,7,9,10,11,27,28,30] is trained to directly predict 3D scene coordinates for the pixels and thus the 2D-3D correspondences are established. These methods do not explicitly rely on feature detection, description and matching, and are able to provide correspondences densely.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Wang

Zhao

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarseto-fine manner from a single RGB image. The network consists of a series of output layers with each of them conditioned on the previous ones. The final output layer predicts the 3D coordinates and the others produce progressively finer discrete location labels. The proposed method outperforms the baseline regression-only network and allows us to train single compact models which scale robustly to large environments. It sets a new state-of-the-art for singleimage RGB localization performance on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and three combined scenes. Moreover, for large-scale outdoor localization on the Aachen Day-Night dataset, our approach is much more accurate than existing scene coordinate regression approaches, and reduces significantly the performance gap w.r.t. explicit feature matching approaches.

show abstract