CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

Irshad, Muhammad Zubair; Kollar, Thomas; Laskey, Michael; Stone, Kevin; Kira, Zsolt

doi:10.1109/icra46639.2022.9811799

Cited by 37 publications

(17 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Notably, only SDFEst [23] supports multiview setups, only iCaps [22] supports tracking over time, and only three methods include the detection part in their pipeline [25,49,24]. The other methods assume that an off-the-shelf detec-tor (typically Mask R-CNN [10]) is available, but do not train it end-to-end with the pose and shape estimation part.…”

Section: Related Workmentioning

confidence: 99%

RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation

Bruns¹

2023

Preprint

View full text Add to dashboard Cite

Recently, various methods for 6D pose and shape estimation of objects at a per-category level have been proposed. This work provides an overview of the field in terms of methods, datasets, and evaluation protocols. First, an overview of existing works and their commonalities and differences is provided. Second, we take a critical look at the predominant evaluation protocol, including metrics and datasets. Based on the findings, we propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison. The results indicate that existing methods do not generalize well to unconstrained orientations and are actually heavily biased towards objects being upright. We provide an easy-to-use evaluation toolbox with well-defined metrics, methods, and dataset interfaces, which allows evaluation and comparison with various state-of-the-art approaches (https://github.com/roym899/pose and shape evaluation).

show abstract

Section: Related Workmentioning

confidence: 99%

RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation

Bruns¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Most methods (including ours) employ two-stage pipelines in which an object detection or instance segmentation module first detects bounding boxes or masks, which are later used to estimate the object's pose and shape. In contrast, Irshad et al [25] proposed to use a single-shot architecture to detect objects and estimate their shape and pose jointly. While such an end-to-end approach might be easier to scale, data collection and data generation becomes more challenging compared to two-stage approaches, which can benefit from large-scale segmentation datasets.…”

Section: B Categorical Pose and Shape Estimationmentioning

confidence: 99%

SDFEst: Categorical Pose and Shape Estimation of Objects From RGB-D Using Signed Distance Fields

Bruns

2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Rich geometric understanding of the world is an important component of many robotic applications such as planning and manipulation. In this paper, we present a modular pipeline for pose and shape estimation of objects from RGB-D images given their category. The core of our method is a generative shape model, which we integrate with a novel initialization network and a differentiable renderer to enable 6D pose and shape estimation from a single or multiple views. We investigate the use of discretized signed distance fields as an efficient shape representation for fast analysis-by-synthesis optimization. Our modular framework enables multi-view optimization and extensibility. We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data. We open-source our approach at https://github.com/roym899/sdfest.

show abstract

“…CASS [34] learns a canonical shape space by VAE [35] to obtain a view-factorized RGB-D embedding. CenterSnap [36] presents a one-stage pipeline to reduce the computational cost. 6-PACK [37] recovers the pose by tracking inter-frame motion of the object.…”

Section: B Category-level Object Pose Estimationmentioning

confidence: 99%

SSP-Pose: Symmetry-Aware Shape Prior Deformation for Direct Category-Level Object Pose Estimation

Zhang¹,

Yan²,

Manhardt³

et al. 2022

Preprint

View full text Add to dashboard Cite

Category-level pose estimation is a challenging problem due to intra-class shape variations. Recent methods deform pre-computed shape priors to map the observed point cloud into the normalized object coordinate space and then retrieve the pose via post-processing, i.e., Umeyama's Algorithm. The shortcomings of this two-stage strategy lie in two aspects:1) The surrogate supervision on the intermediate results can not directly guide the learning of pose, resulting in large pose error after post-processing. 2) The inference speed is limited by the post-processing step. In this paper, to handle these shortcomings, we propose an end-to-end trainable network SSP-Pose for category-level pose estimation, which integrates shape priors into a direct pose regression network. SSP-Pose stacks four individual branches on a shared feature extractor, where two branches are designed to deform and match the prior model with the observed instance, and the other two branches are applied for directly regressing the totally 9 degrees-of-freedom pose and performing symmetry reconstruction and point-wise inlier mask prediction respectively. Consistency loss terms are then naturally exploited to align the outputs of different branches and promote the performance. During inference, only the direct pose regression branch is needed. In this manner, SSP-Pose not only learns category-level pose-sensitive characteristics to boost performance but also keeps a real-time inference speed. Moreover, we utilize the symmetry information of each category to guide the shape prior deformation, and propose a novel symmetry-aware loss to mitigate the matching ambiguity. Extensive experiments on public datasets demonstrate that SSP-Pose produces superior performance compared with competitors with a real-time inference speed at about 25Hz. The codes will be released soon.

show abstract

CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

Cited by 37 publications

References 43 publications

RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation

RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation

SDFEst: Categorical Pose and Shape Estimation of Objects From RGB-D Using Signed Distance Fields

SSP-Pose: Symmetry-Aware Shape Prior Deformation for Direct Category-Level Object Pose Estimation

Contact Info

Product

Resources

About