Extending Absolute Pose Regression to Multiple Scenes

Blanton, Hunter; Greenwell, Connor; Workman, Scott; Jacobs, Nathan

doi:10.1109/cvprw50498.2020.00027

Cited by 21 publications

(12 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tables 1 and 2 show the results obtained with our method (MS-Transformer) and with MSPN on the Cam-bridgeLandmarks and the 7Scenes datasets, respectively. Since MSPN was trained on different scene combinations from the CambridgeLandmarks dataset, we take the best performing model reported by the authors on this dataset [3]. Our method consistently outperforms MSPN across outdoor and indoor scenes, reducing both position and orientation errors.…”

Section: Comparative Analysis Of Aprsmentioning

confidence: 99%

“…However, similar to APRs, a model needs to be trained per scene. In addition, these method are challenging to implement, require a long time to converge and are slower (100ms) by an order of magnitude compared to absolute pose regression approaches (10ms) at inference time [3]. They also suffer from a non-deterministic behavior due to the inherent randomness of RANSAC.…”

Section: Related Workmentioning

confidence: 99%

“…Multi-Scene Absolute Pose Regression methods aim to extend the absolute pose regression paradigm for learning a single model on multiple scenes. Blanton et al proposed the Multi-Scene PoseNet (MSPN), a novel multi-scene absolute pose regression approach [3], where the network first classifies the particular scene related to the input image, and then uses it to index a set of scene-specific weights for regressing the pose. An activation map from a CNN backbone, which is shared across scenes, is used both for scene classification and regressing the pose.…”

Section: Related Workmentioning

confidence: 99%

“…(1) employing a CNN backbone to output a single global latent vector which is used for regressing the pose (2) training a model per scene (scene-specific APRs). Recently, Blanton et al [3] suggested a method for extending single-scene absolute pose regression to a multi-scene paradigm. Similarly to existing APRs, this method applies a CNN backbone for generating a latent global descriptor of the image.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Multi-Scene Absolute Pose Regression with Transformers

Shavit¹,

Ferens²,

Keller³

2021

Preprint

View full text Add to dashboard Cite

Absolute camera pose regressors estimate the position and orientation of a camera from the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron head is trained with images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended for learning multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into candidate pose predictions. This mechanism allows our model to focus on general features that are informative for localization while embedding multiple scenes in parallel. We evaluate our method on commonly benchmarked indoor and outdoor datasets and show that it surpasses both multi-scene and state-of-the-art single-scene absolute pose regressors. We make our code publicly available from here.

show abstract

Section: Comparative Analysis Of Aprsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Multi-Scene Absolute Pose Regression with Transformers

Shavit¹,

Ferens²,

Keller³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…APRs are typically trained per scene, encoding images with a convolutional backbone and then regressing the camera pose parameters with a multi-layer perceptron (MLP) [25,23,24,28,29,48,36]. This scheme was recently extended to learn multiple scenes with a single model using Transformers [38] or by indexing scene-specific weights [5]. Pose encoding was also proposed as a means for introducing scene priors and improving performance [39].…”

Section: Visual Localizationmentioning

confidence: 99%

Learning to Localize in Unseen Scenes with Relative Pose Regressors

Idan¹,

Shavit²,

Keller³

2023

Preprint

View full text Add to dashboard Cite

Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a poselabelled reference. Unlike scene coordinate regression and absolute pose regression methods, which learn absolute scene parameters, RPRs can (theoretically) localize in unseen environments, since they only learn the residual pose between camera pairs. In practice, however, the performance of RPRs is significantly degraded in unseen scenes. In this work, we propose to aggregate paired feature maps into latent codes, instead of operating on global image descriptors, in order to improve the generalization of RPRs. We implement aggregation with concatenation, projection, and attention operations (Transformer Encoders) and learn to regress the relative pose parameters from the resulting latent codes. We further make use of a recently proposed continuous representation of rotation matrices, which alleviates the limitations of the commonly used quaternions. Compared to stateof-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes. We validate our findings and architecture design through multiple ablations. Our code and pretrained models is publicly available 1 .

show abstract