Category Level Object Pose Estimation via Neural Analysis-by-Synthesis

Chen, Xu; Dong, Zijian; Song, Jie; Geiger, Andreas; Hilliges, Otmar

doi:10.1007/978-3-030-58574-7_9

Cited by 84 publications

(64 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The pioneering work of Wang et al [23] proposed Normalized Object Coordinate Space (NOCS) as a category-specific canonical reference frame, so that the category-level pose of a previously unseen object can be defined as the transformation from its NOCS. Several follow-up efforts have improved NOCS by considering articulated objects [11], by incorporating object pose tracking [21], by leveraging analysis-by-synthesis and shape generative models [4,6], or by exploiting learnable deformation [19]. However, all of these approaches adopt fully-supervised training paradigms and assume that object poses are known at training time.…”

Section: Related Workmentioning

confidence: 99%

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

Li¹,

Weng²,

Yi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. To reduce the huge amount of pose annotations needed for categorylevel learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds. During training, our method assumes no ground-truth pose annotations, no CAD models, and no multi-view supervision. The key to our method is to disentangle shape and pose through an invariant shape reconstruction module and an equivariant pose estimation module, empowered by SE(3) equivariant point cloud networks. The invariant shape reconstruction module learns to perform aligned reconstructions, yielding a category-level reference frame without using any annotations. In addition, the equivariant pose estimation module achieves category-level pose estimation accuracy that is comparable to some fully supervised methods. Extensive experiments demonstrate the effectiveness of our approach on both complete and partial depth point clouds from the ModelNet40 benchmark, and on real depth point clouds from the NOCS-REAL 275 dataset. The project page with code and visualizations can be found at: dragonlong.github.io/equi-pose.

show abstract

Section: Related Workmentioning

confidence: 99%

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

Li¹,

Weng²,

Yi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Few previous works [13]- [18] have focused on estimating the 6D poses of unseen objects. Compared to instance-level problems, category-level tasks are much more challenging due to the large intraclass variations in the aspects of texture and shape among instances.…”

Section: B Category-level 6d Object Pose Estimationmentioning

confidence: 99%

“…Recently, category-level 6D object pose estimation has begun to receive increasing attention [13]- [18] given its practical importance. Compared with the instance-level problem, the goal of this task is to predict the 6D pose of unseen object instances of the same category for which no CAD models Lu Zou, Zhangjin Huang, and Naijie Gu are with University of Science and Technology of China, Hefei 230031, China (e-mail: lzou@mail.ustc.edu.cn; zhuang@ustc.edu.cn; gunj@ustc.edu.cn).…”

Section: Introductionmentioning

confidence: 99%

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning

Zou,

Huang,

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper presents 6D vision transformer (6D-ViT), a transformer-based instance representation learning network that is suitable for highly accurate category-level object pose estimation on RGB-D images. Specifically, a novel two-stream encoder-decoder framework is dedicated to exploring complex and powerful instance representations from RGB images, point clouds and categorical shape priors. The whole framework consists of two main branches, named Pixelformer and Pointformer. Pixelformer contains a pyramid transformer encoder with an all-multilayer perceptron (MLP) decoder to extract pixelwise appearance representations from RGB images, while Pointformer relies on a cascaded transformer encoder and an all-MLP decoder to acquire the pointwise geometric characteristics from point clouds. Then, dense instance representations (i.e., correspondence matrix and deformation field) are obtained from a multisource aggregation (MSA) network with shape prior, appearance and geometric information as input. Finally, the instance 6D pose is computed by leveraging the correspondence among dense representations, shape priors, and instance point clouds. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed 3D instance representation learning framework achieves state-of-the-art performance on both types of datasets and significantly outperforms all existing methods. Our code will be available.

show abstract

“…Therefore, the major challenge of category-level pose estimation is how to handle the intra-class variability [8]. Some recent progress has been made to address this challenge [9], [10]. Wang et al [11] propose to transform every object pixel to a canonical Fig.…”

Section: Introductionmentioning

confidence: 99%

iCaps: Iterative Category-level Object Pose and Shape Estimation

Deng¹,

Geng²,

Fox³

2022

Preprint

View full text Add to dashboard Cite

This paper proposes a category-level 6D object pose and shape estimation approach iCaps 1 , which allows tracking 6D poses of unseen objects in a category and estimating their 3D shapes. We develop a category-level auto-encoder network using depth images as input, where feature embeddings from the auto-encoder encode poses of objects in a category. The auto-encoder can be used in a particle filter framework to estimate and track 6D poses of objects in a category. By exploiting an implicit shape representation based on signed distance functions, we build a LatentNet to estimate a latent representation of the 3D shape given the estimated pose of an object. Then the estimated pose and shape can be used to update each other in an iterative way. Our category-level 6D object pose and shape estimation pipeline only requires 2D detection and segmentation for initialization. We evaluate our approach on a publicly available dataset and demonstrate its effectiveness. In particular, our method achieves comparably high accuracy on shape estimation.

show abstract

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis

Cited by 84 publications

References 50 publications

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning

iCaps: Iterative Category-level Object Pose and Shape Estimation

Contact Info

Product

Resources

About