GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild

Grabner, Alexander; Roth, Peter M.; Lepetit, Vincent

doi:10.1109/iccv.2019.00231

Cited by 17 publications

(22 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While instancelevel 3D object pose estimation has long been studied in both robotic and vision communities [19,1,21,42,48,43,60,47,37,24], class-level pose estimation has developed more recently thanks to learning-based methods [45,52,51,34,23,55,14,13,56,66,50]. These methods can be roughly divided into two categories: direct pose estimation methods that regress 3D orientations directly [52,45,34,55,63], and keypoint-based methods that predict 2D locations of 3D keypoints [14,13,56,66,50]. However, annotating 3D poses for objects in the wild is a tedious process of searching best-matching CAD models and aligning them to images [59,58].…”

Section: Related Workmentioning

confidence: 99%

“…Object Detection on Pix3D. Results are given in AccD 0.5 as defined in [13]. We compare with two methods [55,13] that train a class-specific Mask R-CNN on COCO, then fine-tune on a subset of Pix3D containing the same classes as COCO.…”

Section: Class-agnostic Object Detection and Pose Estimationmentioning

confidence: 99%

“…Results are given in AccD 0.5 as defined in [13]. We compare with two methods [55,13] that train a class-specific Mask R-CNN on COCO, then fine-tune on a subset of Pix3D containing the same classes as COCO. In contrast, our agnostic Mask R-CNN is only trained on COCO and can generalize to classes not included in COCO.…”

Section: Class-agnostic Object Detection and Pose Estimationmentioning

confidence: 99%

See 2 more Smart Citations

PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning

Xiao¹,

Du²,

Marlet³

2021

Preprint

View full text Add to dashboard Cite

Motivated by the need of estimating the pose (viewpoint) of arbitrary objects in the wild, which is only covered by scarce and small datasets, we consider the challenging problem of class-agnostic 3D object pose estimation, with no 3D shape knowledge. The idea is to leverage features learned on seen classes to estimate the pose for classes that are unseen, yet that share similar geometries and canonical frames with seen classes. For this, we train a direct pose estimator in a class-agnostic way by sharing weights across all object classes, and we introduce a contrastive learning method that has three main ingredients: (i) the use of pretrained, self-supervised, contrast-based features; (ii) poseaware data augmentations; (iii) a pose-aware contrastive loss. We experimented on Pascal3D+ and ObjectNet3D, as well as Pix3D in a cross-dataset fashion, with both seen and unseen classes. We report state-of-the-art results, including against methods that use additional shape information, and also when we use detected bounding boxes.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Class-agnostic Object Detection and Pose Estimationmentioning

confidence: 99%

Section: Class-agnostic Object Detection and Pose Estimationmentioning

confidence: 99%

See 1 more Smart Citation

PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning

Xiao¹,

Du²,

Marlet³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Detection and tracking in 3D space from video sequences is a relatively unexplored area due to the difficulty in the 6-DoF (six degrees of freedom) pose estimation. In order to accurately estimate 3D positions and poses, many methods [13,23] leverages a predefined object template or priors to jointly infer object depth and rotations. In ClusterVO, the combination of low-level geometric feature descriptors and semantic detections inferred simultaneously in the localization and mapping process can provide additional cues for efficient tracking and accurate object pose estimation.…”

Section: Related Workmentioning

confidence: 99%

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

Huang¹,

Yang²,

Mu³

et al. 2020

Preprint

View full text Add to dashboard Cite

We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving. At the core of our system lies a multi-level probabilistic association mechanism and a heterogeneous Conditional Random Field (CRF) clustering approach combining semantic, spatial and motion information to jointly infer cluster segmentations online for every frame. The poses of camera and dynamic objects are instantly solved through a sliding-window optimization. Our system is evaluated on Oxford Multimotion and KITTI dataset both quantitatively and qualitatively, reaching comparable results to state-of-the-art solutions on both odometry and dynamic trajectory recovery.

show abstract

“…Thus, it is important to know which pixels belong to an object and which pixels belong to the background or another object [7,8]. Recent works showed that deep learning techniques for instance segmentation [17] significantly increase the accuracy on this task [15,26,59]. However, until now location fields have only been used for 3D pose estimation, but not for 3D model retrieval or other tasks.…”

Section: Location Fieldsmentioning

confidence: 99%

Location Field Descriptors: Single Image 3D Model Retrieval in the Wild

Grabner

Roth

Lepetit

2019

2019 International Conference on 3D Vision (3DV)

Self Cite

View full text Add to dashboard Cite

We present Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild. In contrast to previous methods that directly map 3D models and RGB images to an embedding space, we establish a common low-level representation in the form of location fields from which we compute pose invariant 3D shape descriptors. Location fields encode correspondences between 2D pixels and 3D surface coordinates and, thus, explicitly capture 3D shape and 3D pose information without appearance variations which are irrelevant for the task. This early fusion of 3D models and RGB images results in three main advantages: First, the bottleneck location field prediction acts as a regularizer during training. Second, major parts of the system benefit from training on a virtually infinite amount of synthetic data. Finally, the predicted location fields are visually interpretable and unblackbox the system. We evaluate our proposed approach on three challenging real-world datasets (Pix3D, Comp, and Stanford) with different object categories and significantly outperform the state-of-the-art by up to 20% absolute in multiple 3D retrieval metrics.

show abstract

GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild

Cited by 17 publications

References 51 publications

PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning

PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

Location Field Descriptors: Single Image 3D Model Retrieval in the Wild

Contact Info

Product

Resources

About