We introduce Similarity Group Proposal Network (SGPN), a simple and intuitive deep learning framework for 3D object instance segmentation on point clouds. SGPN uses a single network to predict point grouping proposals and a corresponding semantic class for each proposal, from which we can directly extract instance segmentation results. Important to the effectiveness of SGPN is its novel representation of 3D instance segmentation results in the form of a similarity matrix that indicates the similarity between each pair of points in embedded feature space, thus producing an accurate grouping proposal for each point. Experimental results on various 3D scenes show the effectiveness of our method on 3D instance segmentation, and we also evaluate the capability of SGPN to improve 3D object detection and semantic segmentation results. We also demonstrate its flexibility by seamlessly incorporating 2D CNN features into the framework to boost performance.
Point clouds are an efficient data format for 3D data. However, existing 3D segmentation methods for point clouds either do not model local dependencies [22] or require added computations [15,24]. This work presents a novel 3D segmentation framework, RSNet 1 , to efficiently model local structures in point clouds. The key component of the RSNet is a lightweight local dependency module. It is a combination of a novel slice pooling layer, Recurrent Neural Network (RNN) layers, and a slice unpooling layer. The slice pooling layer is designed to project features of unordered points onto an ordered sequence of feature vectors so that traditional end-to-end learning algorithms (RNNs) can be applied. The performance of RSNet is validated by comprehensive experiments on the S3DIS[1], ScanNet [3], and ShapeNet [35] datasets. In its simplest form, RSNets surpass all previous state-of-the-art methods on these benchmarks. And comparisons against previous state-of-the-art methods [22,24] demonstrate the efficiency of RSNets.
Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-the-art methods either use depth as additional images or process spatial information in 3D volumes or point clouds. These methods suffer from high computation and memory cost. To address these issues, we present Depth-aware CNN by introducing two intuitive, flexible and effective operations: depth-aware convolution and depth-aware average pooling. By leveraging depth similarity between pixels in the process of information propagation, geometry is seamlessly incorporated into CNN. Without introducing any additional parameters, both operators can be easily integrated into existing CNNs. Extensive experiments and ablation studies on challenging RGB-D semantic segmentation benchmarks validate the effectiveness and flexibility of our approach.
IntroductionRecent advances [1,2,3] in CNN have achieved significant success in scene understanding. With the help of range sensors (such as Kinect, LiDAR etc.), depth images are applicable along with RGB images. Taking advantages of the two complementary modalities with CNN is able to improve the performance of scene understanding. However, CNN is limited to model geometric variance due to the fixed grid computation structure. Incorporating the geometric information from depth images into CNN is important yet challenging.Extensive studies [4,5,6,7,8,9,10] have been carried out on this task. FCN [1] and its successors treat depth as another input image and construct two CNNs to process RGB and depth separately. This doubles the number of network parameters and computation cost. In addition, the two-stream network architecture still suffers from the fixed geometric structures of CNN. Even if the geometric relations of two pixels are given, this relation cannot be used in information propagation of CNN. An alternative is to leverage 3D networks [4,11,12] to handle geometry. Nevertheless, both volumetric CNNs [11] and 3D point cloud graph networks [4] are computationally more expensive than 2D CNN. Despite the encouraging results of these progresses, we need to seek a more flexible and efficient way to exploit 3D geometric information in 2D CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.