Figure 1: Our dataset provides dense annotations for each scan of all sequences from the KITTI Odometry Benchmark [19]. Here, we show multiple scans aggregated using pose information estimated by a SLAM approach. AbstractSemantic scene understanding is important for various applications. In particular, self-driving cars need a finegrained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR.In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360 o field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions. * indicates equal contribution
Abstract.A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Despite their shift-invariant properties, overlapping pooling windows are no significant improvement over non-overlapping pooling windows. By applying this knowledge, we achieve state-of-the-art error rates of 4.57% on the NORB normalized-uniform dataset and 5.6% on the NORB jittered-cluttered dataset.
Registration is an important step when processing three-dimensional (3-D) point clouds. Applications for registration range from object modeling and tracking, to simultaneous localization and mapping (SLAM). This article presents the open-source point cloud library (PCL) and the tools available for point cloud registration. The PCL incorporates methods for the initial alignment of point clouds using a variety of local shape feature descriptors, as well as methods for refining initial alignments using different variants of the well-known iterative closest point (ICP) algorithm. This article provides an overview on registration algorithms, usage examples of their PCL implementations, and tips for their application. Since the choice and parameterization of the right algorithm for a particular type of data is one of the biggest problems in 3-D point cloud registration, we present three complete examples of data (and applications) and the respective registration pipeline in the PCL. These examples include dense red-greenblue-depth (RGB-D) point clouds acquired by consumer color and depth cameras, high-resolution laser scans from commercial 3-D scanners, and low-resolution sparse point clouds captured by a custom lightweight 3-D scanner on a microaerial vehicle (MAV). Registration of 3-D Point CloudsThe problem of consistently aligning two or more point clouds, i.e., sets of 3-D points, is inherent for 3-D registration. Often the point clouds are acquired by 3-D sensors from different viewpoints. The registration finds the relative pose (position and orientation) between views in a global coordinate frame, such that the overlapping areas between the point clouds match as well as possible; for two examples of registration see Figure 1. The overall objective of registration is to align individual point clouds and fuse them to a single point cloud so that subsequent
Abstract. Real-time 3D perception of the surrounding environment is a crucial precondition for the reliable and safe application of mobile service robots in domestic environments. Using a RGB-D camera, we present a system for acquiring and processing 3D (semantic) information at frame rates of up to 30Hz that allows a mobile robot to reliably detect obstacles and segment graspable objects and supporting surfaces as well as the overall scene geometry. Using integral images, we compute local surface normals. The points are then clustered, segmented, and classified in both normal space and spherical coordinates. The system is tested in different setups in a real household environment. The results show that the system is capable of reliably detecting obstacles at high frame rates, even in case of obstacles that move fast or do not considerably stick out of the ground. The segmentation of all planes in the 3D data even allows for correcting characteristic measurement errors and for reconstructing the original scene geometry in far ranges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.