Abstract:Simultaneous localization and mapping (SLAM) is a fundamental problem for various applications. For indoor environments, planes are predominant features that are less affected by measurement noise. In this paper, we propose a novel point-plane SLAM system using RGB-D cameras. First, we extract feature points from RGB images and planes from depth images. Then plane correspondences in the global map can be found using their contours. Considering the limited size of real planes, we exploit constraints of plane ed… Show more
“…A recent RGB-D SLAM was proposed in [144] where points and planes are exploited to estimate the pose of a camera and a map of its surroundings. ORB features are extracted from RGB frames and handled by the RGB-D version of ORB-SLAM2.…”
Visual simultaneous localization and mapping (SLAM) has attracted high attention over the past few years. In this paper, a comprehensive survey of the state-of-the-art feature-based visual SLAM approaches is presented. The reviewed approaches are classified based on the visual features observed in the environment. Visual features can be seen at different levels; low-level features like points and edges, middle-level features like planes and blobs, and high-level features like semantically labeled objects. One of the most critical research gaps regarding visual SLAM approaches concluded from this study is the lack of generality. Some approaches exhibit a very high level of maturity, in terms of accuracy and efficiency. Yet, they are tailored to very specific environments, like feature-rich and static environments. When operating in different environments, such approaches experience severe degradation in performance. In addition, due to software and hardware limitations, guaranteeing a robust visual SLAM approach is extremely challenging. Although semantics have been heavily exploited in visual SLAM, understanding of the scene by incorporating relationships between features is not yet fully explored. A detailed discussion of such research challenges is provided throughout the paper.
“…A recent RGB-D SLAM was proposed in [144] where points and planes are exploited to estimate the pose of a camera and a map of its surroundings. ORB features are extracted from RGB frames and handled by the RGB-D version of ORB-SLAM2.…”
Visual simultaneous localization and mapping (SLAM) has attracted high attention over the past few years. In this paper, a comprehensive survey of the state-of-the-art feature-based visual SLAM approaches is presented. The reviewed approaches are classified based on the visual features observed in the environment. Visual features can be seen at different levels; low-level features like points and edges, middle-level features like planes and blobs, and high-level features like semantically labeled objects. One of the most critical research gaps regarding visual SLAM approaches concluded from this study is the lack of generality. Some approaches exhibit a very high level of maturity, in terms of accuracy and efficiency. Yet, they are tailored to very specific environments, like feature-rich and static environments. When operating in different environments, such approaches experience severe degradation in performance. In addition, due to software and hardware limitations, guaranteeing a robust visual SLAM approach is extremely challenging. Although semantics have been heavily exploited in visual SLAM, understanding of the scene by incorporating relationships between features is not yet fully explored. A detailed discussion of such research challenges is provided throughout the paper.
“…Visual SLAM systems observe landmarks from different poses and construct constraints to solve the camera poses and the landmarks’ locations [ 1 ]. Conventional visual SLAM systems generally use point features [ 8 , 9 , 20 ], line features [ 12 ], and plane features [ 10 , 11 ] as landmarks. Those low-dimensional geometric representations can help the robot locate its poses, while the lack of semantic information in the map limits the mobile robot’s ability to understand the environment.…”
Section: Related Workmentioning
confidence: 99%
“…Conventional visual SLAM uses the descriptor of point features [ 6 , 7 ], or the geometric difference of line and plane features [ 10 , 11 , 12 ], to track and solve data associations of observations between frames. After associations solved at the front end, it is fixed during the optimization in the back end.…”
Section: Related Workmentioning
confidence: 99%
“…A life-long robot needs to adapt to illumination changes, sensor noise, and significant viewing angle changes—all of those put forward requirements for the robustness of landmarks in its map. The existing visual SLAM technology relies on feature descriptors [ 6 , 7 ], surface textures [ 8 , 9 ], or the geometric position of plane and line structures [ 10 , 11 , 12 ] to solve data associations. However, artificially designed descriptors are challenging to adapt to significant viewing angle changes and are easily disturbed by illumination and sensor noise.…”
Indoor service robots need to build an object-centric semantic map to understand and execute human instructions. Conventional visual simultaneous localization and mapping (SLAM) systems build a map using geometric features such as points, lines, and planes as landmarks. However, they lack a semantic understanding of the environment. This paper proposes an object-level semantic SLAM algorithm based on RGB-D data, which uses a quadric surface as an object model to compactly represent the object’s position, orientation, and shape. This paper proposes and derives two types of RGB-D camera-quadric observation models: a complete model and a partial model. The complete model combines object detection and point cloud data to estimate a complete ellipsoid in a single RGB-D frame. The partial model is activated when the depth data is severely missing because of illuminations or occlusions, which uses bounding boxes from object detection to constrain objects. Compared with the state-of-the-art quadric SLAM algorithms that use a monocular observation model, the RGB-D observation model reduces the requirements of the observation number and viewing angle changes, which helps improve the accuracy and robustness. This paper introduces a nonparametric pose graph to solve data associations in the back end, and innovatively applies it to the quadric surface model. We thoroughly evaluated the algorithm on two public datasets and an author-collected mobile robot dataset in a home-like environment. We obtained obvious improvements on the localization accuracy and mapping effects compared with two state-of-the-art object SLAM algorithms.
“…Modern technologies enable the development of new and more sophisticated sensors, such as stereo cameras, RGB-D sensors, Laser Range Finders (LRF), etc., which are often equipped with powerful processor units to capture and process information-rich data in real time. Due to their increased popularity and ubiquity, they are indispensable in mobile robotics applications [ 1 , 2 , 3 , 4 ]. Processing the data obtained from the aforementioned sensors is a computationally demanding task; therefore, it is necessary to ensure the optimal method of processing the acquired data for real-time operations is used, such as vision-based high speed driving [ 5 ] or visual inspection in manufacturing processes [ 6 ].…”
This paper presents an approach of depth image segmentation based on the Evolving Principal Component Clustering (EPCC) method, which exploits data locality in an ordered data stream. The parameters of linear prototypes, which are used to describe different clusters, are estimated in a recursive manner. The main contribution of this work is the extension and application of the EPCC to 3D space for recursive and real-time detection of flat connected surfaces based on linear segments, which are all detected in an evolving way. To obtain optimal results when processing homogeneous surfaces, we introduced two-step filtering for outlier detection within a clustering framework and considered the noise model, which allowed for the compensation of characteristic uncertainties that are introduced into the measurements of depth sensors. The developed algorithm was compared with well-known methods for point cloud segmentation. The proposed approach achieves better segmentation results over longer distances for which the signal-to-noise ratio is low, without prior filtering of the data. On the given database, an average rate higher than 90% was obtained for successfully detected flat surfaces, which indicates high performance when processing huge point clouds in a non-iterative manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.