2021
DOI: 10.48550/arxiv.2103.11468
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Multi-Scene Absolute Pose Regression with Transformers

Abstract: Absolute camera pose regressors estimate the position and orientation of a camera from the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron head is trained with images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended for learning multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Our approach performs similar to existing APR and RPR techniques that also use only a single forward pass in a network [1,8,30,60], but worse than iterative approaches such as [19] or methods that use more densely spaced synthetic views as additional input [41]. Note that these approaches that do not use 3D scene geometry are less accurate than state-of-the-art methods based on 2D-3D correspondences [7,56,58].…”
Section: Methodsmentioning
confidence: 80%
See 2 more Smart Citations
“…Our approach performs similar to existing APR and RPR techniques that also use only a single forward pass in a network [1,8,30,60], but worse than iterative approaches such as [19] or methods that use more densely spaced synthetic views as additional input [41]. Note that these approaches that do not use 3D scene geometry are less accurate than state-of-the-art methods based on 2D-3D correspondences [7,56,58].…”
Section: Methodsmentioning
confidence: 80%
“…Since this task can be considered an "inverse" of the novel view synthesis task [70], we consider the ability to perform both tasks via the same model to be an intriguing property. Even though the localization results are not yet competitive with state-of-the-art localization pipelines, we achieve a similar level of pose accuracy as comparable methods such as [1,60].…”
Section: Introductionmentioning
confidence: 82%
See 1 more Smart Citation