Deep learning approaches have made tremendous progress in the field of semantic segmentation over the past few years. However, most current approaches operate in the 2D image space. Direct semantic segmentation of unstructured 3D point clouds is still an open research problem. The recently proposed PointNet architecture presents an interesting step ahead in that it can operate on unstructured point clouds, achieving encouraging segmentation results. However, it subdivides the input points into a grid of blocks and processes each such block individually. In this paper, we investigate the question how such an architecture can be extended to incorporate larger-scale spatial context. We build upon PointNet and propose two extensions that enlarge the receptive field over the 3D scene. We evaluate the proposed strategies on challenging indoor and outdoor datasets and show improved results in both scenarios.
Input: 3D Point CloudOutput: Semantic Segmentation Outdoor SceneIndoor Scene Fig. 1. We present a deep learning framework that predicts a semantic label for each point in a given 3D point cloud. The main components of our approach are point neighborhoods in different feature spaces and dedicated loss functions which help to refine the learned feature spaces. Left: point clouds from indoor and outdoor scenes. Right: semantic segmentation results produced by the presented method.Abstract. In this paper, we present a deep learning architecture which addresses the problem of 3D semantic segmentation of unstructured point clouds. Compared to previous work, we introduce grouping techniques which define point neighborhoods in the initial world space and the learned feature space. Neighborhoods are important as they allow to compute local or global point features depending on the spatial extend of the neighborhood. Additionally, we incorporate dedicated loss functions to further structure the learned point feature space: the pairwise distance loss and the centroid loss. We show how to apply these mechanisms to the task of 3D semantic segmentation of point clouds and report state-of-the-art performance on indoor and outdoor datasets.
In this work, we propose Dilated Point Convolutions (DPC) which drastically increase the receptive field of convolutions on 3D point clouds. As we show in our experiments, the size of the receptive field is directly related to the performance of dense tasks such as semantic segmentation. We look at different network architectures and mechanisms to increase the receptive field size of point convolutions and propose in particular dilated point convolutions. Importantly, our dilation mechanism can easily be integrated into all existing methods using nearest-neighbor-based point convolutions. To evaluate the resulting network architectures, we visualize the receptive field and report competitive scores on the task of 3D semantic segmentation on the S3DIS and ScanNet datasets.
2[0000−0002−3269−6976] , Francis Engelmann 1[0000−0001−5745−2137] , Theodora Kontogianni 1[0000−0002−8754−8356] , and Bastian Leibe 1[0000−0003−4225−0051]Abstract. Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is currently less explored. In this work, we present 3D-BEVIS (3D bird's-eye-view instance segmentation), a deep learning framework for joint semantic-and instance-segmentation on 3D point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods process local sub-parts of a full scene independently, followed by a heuristic merging step. However, to perform instance segmentation by clustering on a full scene, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information using an intermediate bird's-eye view representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.