2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00273
|View full text |Cite
|
Sign up to set email alerts
|

Learning Multi-Scene Absolute Pose Regression with Transformers

Abstract: Absolute camera pose regressors estimate the position and orientation of a camera given the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended to learn multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(34 citation statements)
references
References 43 publications
0
28
0
Order By: Relevance
“…conclude that the proposed E-PoseNet achieves the lowest location error across all the outdoor and indoor scenes, and the lowest orientation error across the majority of them. It also competes with most recent transformer-based architectures [15,16] on these datasets.…”
Section: Datasetsmentioning
confidence: 89%
“…conclude that the proposed E-PoseNet achieves the lowest location error across all the outdoor and indoor scenes, and the lowest orientation error across the majority of them. It also competes with most recent transformer-based architectures [15,16] on these datasets.…”
Section: Datasetsmentioning
confidence: 89%
“…PoseNet has been further improved by combining CNNs and LSTMs for feature correlation [43], introducing temporal information [6], incorporating spatial constraints [1] or by adding additional covisibility constraints based on local maps and the estimated odometry [49]. MS-Transformer [36] is a recent relocalization work based on transformer architecture, achieving the stateof-the-art results.…”
Section: A Learning-based Pose Estimationmentioning
confidence: 99%
“…We follow the official data split to train and test our models above this dataset. Task Baselines: Our SelectFusion model is built as an end-toend relocalization model, and thus we compare with LSTM-Pose [43], VidLoc [6], and MS-Transformer [36] which are representative within this category of learning techniques.…”
Section: A Experimental Setupsmentioning
confidence: 99%
See 1 more Smart Citation
“…An alternative to explicitly representing the 3D scene geometry via a 3D model is to implicitly store information about the scene in the weights of a machine learning model. Examples include scene coordinate regression techniques [9, 10, 12, 14-16, 75, 90], which regress 2D-3D matches rather than computing them via explicit descriptor matching, and absolute [34,35,48,74,93] and relative pose [3,23,37] regressors. Scene coordinate regressors achieve state-ofthe-art results for small scenes [8], but have not yet shown strong performance in more challenging scenes.…”
Section: Introductionmentioning
confidence: 99%