The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

Rockwell, Chris; Johnson, Justin; Fouhey, David F.

doi:10.1109/3dv57658.2022.00028

Cited by 12 publications

(5 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Applying a negative log-likelihood loss for this distribution allows for improved rotation estimation and was successfully used for head pose estimation by Liu et al [32]. Rockwell et al [42] present a baseline to directly estimate the relative pose between two images by training a Vision Transformer (ViT) to bring its computations close to the eight-point algorithm. They achieve competitive results in multiple settings.…”

Section: Related Workmentioning

confidence: 99%

“…Attention based methods. We also compare to a recent work by Rockwell et al [42] (8PointVit) using a Vision Transformer (ViT) to estimate the relative pose. Although Rockwell et al achieve competitive results in multiple settings, their approach is less suited for extreme view changes.…”

Section: Comparative Baselinesmentioning

confidence: 99%

“…The proposed approach is shown to be accurate for both indoor and outdoor scenes and significantly outperforms the baselines schemes in all overlap categories. For nonoverlapping pairs, correspondencebased methods such as SIFT [6], SuperPointNet [13], Reg6D et al [57] and 8PointViT [42] failed to provide any estimates, as they require feature correspondence. The DenseCorrVol approach [7] provides accurate results in extreme cases, but our approach outperforms it.…”

Section: Experimental Comparisonsmentioning

confidence: 99%

See 2 more Smart Citations

Estimating Extreme 3D Image Rotation with Transformer Cross-Attention

Dekel¹,

Keller²

2023

Preprint

View full text Add to dashboard Cite

The estimation of large and extreme image rotation plays a key role in multiple computer vision domains, where the rotated images are related by a limited or a nonoverlapping field of view. Contemporary approaches apply convolutional neural networks to compute a 4D correlation volume to estimate the relative rotation between image pairs. In this work, we propose a cross-attentionbased approach that utilizes CNN feature maps and a Transformer-Encoder, to compute the cross-attention between the activation maps of the image pairs, which is shown to be an improved equivalent of the 4D correlation volume, used in previous works. In the suggested approach, higher attention scores are associated with image regions that encode visual cues of rotation. Our approach is end-to-end trainable and optimizes a simple regression loss. It is experimentally shown to outperform contemporary state-of-the-art schemes when applied to commonly used image rotation datasets and benchmarks, and establishes a new state-of-the-art accuracy on these datasets. We make our code publicly available 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Comparative Baselinesmentioning

confidence: 99%

Section: Experimental Comparisonsmentioning

confidence: 99%

See 1 more Smart Citation

Estimating Extreme 3D Image Rotation with Transformer Cross-Attention

Dekel¹,

Keller²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Neural pose prediction from RGB images. A series of methods (Lin et al, 2023a;Rockwell et al, 2022;Cai et al, 2021) have sought to address this issue by directly regressing camera poses through network predictions. Notably, these methods do not incorporate 3D shape information during the camera pose prediction process.…”

Section: Related Workmentioning

confidence: 99%

Dense 3D Reconstruction of Non-cooperative Target Based on Pose Measurement

Wang¹,

Wang²,

Zhao³

et al. 2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ∼1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspectiven-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ∼1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm.

show abstract

“…Jiang et al [26] embed epipolar geometry constraints into a self-supervised learning framework through the joint optimization of camera poses and optical flow. In [27,28], they use the Eight-Point Algorithm as a neural network inductive bias to regress fundamental or essential matrices. Wang et al [29] employ scale-invariant loss functions to train their model.…”

Section: Two-view Camera Pose Estimationmentioning

confidence: 99%

GMIW-Pose: Camera Pose Estimation via Global Matching and Iterative Weighted Eight-Point Algorithm

Chen,

Wu,

Liao

et al. 2023

Electronics

View full text Add to dashboard Cite

We propose a novel approach, GMIW-Pose, to estimate the relative camera poses between two views. This method leverages a Transformer-based global matching module to obtain robust 2D–2D dense correspondences, followed by iterative refinement of matching weights using ConvGRU. Ultimately, the camera’s relative pose is determined through the weighted eight-point algorithm. Compared with the previous best two-view pose estimation method, GMIW-Pose reduced the Absolute Trajectory Error (ATE) by 24% on the TartanAir dataset; it achieved the best or second-best performance in multiple scenarios of the TUM-RGBD and KITTI datasets without fine-tuning, among which ATE decreased by 22% on the TUM-RGBD dataset.

show abstract

The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

Cited by 12 publications

References 58 publications

Estimating Extreme 3D Image Rotation with Transformer Cross-Attention

Estimating Extreme 3D Image Rotation with Transformer Cross-Attention

Dense 3D Reconstruction of Non-cooperative Target Based on Pose Measurement

GMIW-Pose: Camera Pose Estimation via Global Matching and Iterative Weighted Eight-Point Algorithm

Contact Info

Product

Resources

About