“…To recover object shapes, some methods Groueix et al 2018;Wang et al 2018] reconstruct meshes from a template, and others [Huang et al 2018b;Izadinia et al 2017] adopt shape retrieval approaches to search from a given CAD database. Recently, some approaches [Dahnert et al 2021;Nie et al 2020;Popov et al 2020;Yang and Zhang 2016;Zhang et al 2021b] enable 3D scene understanding by generating a room layout, camera pose, object bounding boxes, or even meshes from a single view, automatically completing and annotating scene meshes [Bokhovkin et al 2021] or predicting object alignments and layouts [Avetisyan et al 2020] from an RGB-D scan. Inspired by PanoContext [Zhang et al 2014] that panoramic images contain richer context information than the perspective ones, Zhang et al [Zhang et al 2021a] propose a better 3D scene understanding method with panoramic captures as input.…”
“…To recover object shapes, some methods Groueix et al 2018;Wang et al 2018] reconstruct meshes from a template, and others [Huang et al 2018b;Izadinia et al 2017] adopt shape retrieval approaches to search from a given CAD database. Recently, some approaches [Dahnert et al 2021;Nie et al 2020;Popov et al 2020;Yang and Zhang 2016;Zhang et al 2021b] enable 3D scene understanding by generating a room layout, camera pose, object bounding boxes, or even meshes from a single view, automatically completing and annotating scene meshes [Bokhovkin et al 2021] or predicting object alignments and layouts [Avetisyan et al 2020] from an RGB-D scan. Inspired by PanoContext [Zhang et al 2014] that panoramic images contain richer context information than the perspective ones, Zhang et al [Zhang et al 2021a] propose a better 3D scene understanding method with panoramic captures as input.…”
“…In comparison, our focus is on learning part-based semantic and instance segmentation of noisy and fragmented real-world 3D scans. Very recently, initial approaches to semantic 3D segmentation have been proposed (Bokhovkin et al, 2021;Uy et al, 2019) but for a significantly less extensive part hierarchy. More specifically, (Bokhovkin et al, 2021) targets predicting part hierarchy at object and coarse parts levels, discarding smaller parts altogether; in contrast, we are able to predict parts at finer levels in the hierarchy.…”
We propose Scan2Part, a method to segment individual parts of objects in real-world, noisy indoor RGB-D scans. To this end, we vary the part hierarchies of objects in indoor scenes and explore their effect on scene understanding models. Specifically, we use a sparse U-Net-based architecture that captures the fine-scale detail of the underlying 3D scan geometry by leveraging a multi-scale feature hierarchy. In order to train our method, we introduce the Scan2Part dataset, which is the first large-scale collection providing detailed semantic labels at the part level in the real-world setting. In total, we provide 242,081 correspondences between 53,618 PartNet parts of 2,477 ShapeNet objects and 1,506 ScanNet scenes, at two spatial resolutions of 2 cm 3 and 5 cm 3 . As output, we are able to predict fine-grained per-object part labels, even when the geometry is coarse or partially missing. Overall, we believe that both our method as well as newly introduced dataset is a stepping stone forward towards structural understanding of real-world 3D environments.
“…In the domain of object modeling, part-based approaches leverage computer vision techniques to track movements among object parts [22,23], exploit contextual relations from large datasets [24][25][26] or develop data-efficient learning methods [27,28]. These approaches aim to recognize and segment object parts, enhancing the understanding of complex object structures, but they do not yield a holistic representation of a scene that encompasses multiple objects.…”
Existing methods for reconstructing interactive scenes primarily focus on replacing reconstructed objects with CAD models retrieved from a limited database, resulting in significant discrepancies between the reconstructed and observed scenes. To address this issue, our work introduces a partlevel reconstruction approach that reassembles objects using primitive shapes. This enables us to precisely replicate the observed physical scenes and simulate robot interactions with both rigid and articulated objects. By segmenting reconstructed objects into semantic parts and aligning primitive shapes to these parts, we assemble them as CAD models while estimating kinematic relations, including parent-child contact relations, joint types, and parameters. Specifically, we derive the optimal primitive alignment by solving a series of optimization problems, and estimate kinematic relations based on part semantics and geometry. Our experiments demonstrate that part-level scene reconstruction outperforms object-level reconstruction by accurately capturing finer details and improving precision. These reconstructed part-level interactive scenes provide valuable kinematic information for various robotic applications; we showcase the feasibility of certifying mobile manipulation planning in these interactive scenes before executing tasks in the physical world.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.