2020 25th International Conference on Pattern Recognition (ICPR) 2021
DOI: 10.1109/icpr48806.2021.9412225
|View full text |Cite
|
Sign up to set email alerts
|

Do We Really Need Scene-specific Pose Encoders?

Abstract: Visual pose regression models estimate the camera pose from a query image with a single forward pass. Current models learn pose encoding from an image using deep convolutional networks which are trained per scene. The resulting encoding is typically passed to a multi-layer perceptron in order to regress the pose. In this work, we propose that scene-specific pose encoders are not required for pose regression and that encodings trained for visual similarity can be used instead. In order to test our hypothesis, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 33 publications
0
14
0
Order By: Relevance
“…To the best of our knowledge, the paradigm of regressing the camera pose from the final output of a CNN backbone was adopted by all regressors to date [30]. Variations to the architecture focused on alternatives to the original proposed CNN backbone [20,21,38,31] and on deeper, branching architectures for the MLP head [38,21]. Other works tried to address overfitting by averaging over predictions from models with randomly dropped activations [14] or by reducing the dimensionality of the global image encoding with Long-Short-Term-Memory (LSTM) layers [36].…”
Section: Image-based Camera Pose Estimationmentioning
confidence: 99%
See 4 more Smart Citations
“…To the best of our knowledge, the paradigm of regressing the camera pose from the final output of a CNN backbone was adopted by all regressors to date [30]. Variations to the architecture focused on alternatives to the original proposed CNN backbone [20,21,38,31] and on deeper, branching architectures for the MLP head [38,21]. Other works tried to address overfitting by averaging over predictions from models with randomly dropped activations [14] or by reducing the dimensionality of the global image encoding with Long-Short-Term-Memory (LSTM) layers [36].…”
Section: Image-based Camera Pose Estimationmentioning
confidence: 99%
“…This formulation was adopted by many pose regressors, however it still requires manually tuning the parameters' initialization for different datasets [34]. In a recent work [31], the authors trained the model separately for position and orientation in order to reduce the need of additional parameters, while achieving comparable accuracy. Alternative representations for the orientation were also proposed to gain better balance and stability of the pose loss [38,5].…”
Section: Image-based Camera Pose Estimationmentioning
confidence: 99%
See 3 more Smart Citations