Abstract:In practical applications, how to achieve a perfect balance between high accuracy and computational efficiency can be the main challenge for simultaneous localization and mapping (SLAM). To solve this challenge, we propose SD-VIS, a novel fast and accurate semi-direct visual-inertial SLAM framework, which can estimate camera motion and structure of surrounding sparse scenes. In the initialization procedure, we align the pre-integrated IMU measurements and visual images and calibrate out the metric scale, initi… Show more
“…Traditional visual SLAM can be divided into two classes: feature-based and direct methods. Feature-based methods extract salient image features in each image, match them in successive frames using invariant feature descriptors, robustly recover camera poses and structure using epipolar geometry, and refine poses and structure by minimizing projection errors [4]. Despite the good performances in the past several years, these feature-based approaches are still very sensitive to noise and outliers, time-consuming during the process of feature extraction and matching.…”
In practical applications, how to use the complementary strengths of the direct and the feature-based methods for effective fusion may be the main challenge of simultaneous localization and mapping (SLAM). To solve this challenge, we propose the DO-SLAM, a novel fast and accurate semidirect visual SLAM framework, which can maintain the direct method's fast performance and the high precision and loop closure capability of the feature-based method. The direct method is used as the first half of the DO-SLAM to track the camera pose rapidly and robustly. The feature-based method is used as the second half of the DO-SLAM to refine the keyframe poses, perform loop closures, and build a globally consistent, long-term, sparse feature map that can be reused. The proposed pipeline fuses direct odometry and feature-based SLAM to perform three levels of parallel optimizations: (1) In the direct method module, the keyframe poses are estimated by minimizing the photometric error, (2) In the feature-based module, using the poses calculated by the inter-frame matching to correct and fuse the poses calculated by the direct method module as the initial poses, and the initial poses are optimized by the motion-only bundle adjustment, and (3) A pose graph optimization is used to achieve global map consistency in the presence of loop closures. Experimental evaluation on two benchmark datasets demonstrates that the proposed approach achieves higher accuracy and robustness on motion estimation compared to the other state-of-the-art methods.INDEX TERMS simultaneous localization and mapping(SLAM), semi-direct SLAM, three levels of parallel optimizations
“…Traditional visual SLAM can be divided into two classes: feature-based and direct methods. Feature-based methods extract salient image features in each image, match them in successive frames using invariant feature descriptors, robustly recover camera poses and structure using epipolar geometry, and refine poses and structure by minimizing projection errors [4]. Despite the good performances in the past several years, these feature-based approaches are still very sensitive to noise and outliers, time-consuming during the process of feature extraction and matching.…”
In practical applications, how to use the complementary strengths of the direct and the feature-based methods for effective fusion may be the main challenge of simultaneous localization and mapping (SLAM). To solve this challenge, we propose the DO-SLAM, a novel fast and accurate semidirect visual SLAM framework, which can maintain the direct method's fast performance and the high precision and loop closure capability of the feature-based method. The direct method is used as the first half of the DO-SLAM to track the camera pose rapidly and robustly. The feature-based method is used as the second half of the DO-SLAM to refine the keyframe poses, perform loop closures, and build a globally consistent, long-term, sparse feature map that can be reused. The proposed pipeline fuses direct odometry and feature-based SLAM to perform three levels of parallel optimizations: (1) In the direct method module, the keyframe poses are estimated by minimizing the photometric error, (2) In the feature-based module, using the poses calculated by the inter-frame matching to correct and fuse the poses calculated by the direct method module as the initial poses, and the initial poses are optimized by the motion-only bundle adjustment, and (3) A pose graph optimization is used to achieve global map consistency in the presence of loop closures. Experimental evaluation on two benchmark datasets demonstrates that the proposed approach achieves higher accuracy and robustness on motion estimation compared to the other state-of-the-art methods.INDEX TERMS simultaneous localization and mapping(SLAM), semi-direct SLAM, three levels of parallel optimizations
“…For SLAM technology, various systems or platforms have been introduced, such as the Lidar system [5], stereo camera [6] and RGBD-camera [7]. Some technologies based on SLAM can contribute to the improvement of mapping accuracy, such as a Pseudo-GNSS/INS module integrated framework with probabilistic SLAM [8], a 2D SLAM system using low-cost Kinect Sensor [9], prediction-based SLAM (P-SLAM) [10], graph-based hierarchical SLAM framework [11], semi-direct visual-inertial SLAM framework [12], and a CPU-only pipeline for SLAM [13]. Similar to traditional data fusion technology [14], SLAM with data fusion technologies has also been developed accordingly, such as a fusion of the RGB image and Lidar point cloud [15][16][17].…”
Reducing the cumulative error is a crucial task in simultaneous localization and mapping (SLAM). Usually, Loop Closure Detection (LCD) is exploited to accomplish this work for SLAM and robot navigation. With a fast and accurate loop detection, it can significantly improve global localization stability and reduce mapping errors. However, the LCD task based on point cloud still has some problems, such as over-reliance on high-resolution sensors, and poor detection efficiency and accuracy. Therefore, in this paper, we propose a novel and fast global LCD method using a low-cost 16 beam Lidar based on “Simplified Structure”. Firstly, we extract the “Simplified Structure” from the indoor point cloud, classify them into two levels, and manage the “Simplified Structure” hierarchically according to its structure salience. The “Simplified Structure” has simple feature geometry and can be exploited to capture the indoor stable structures. Secondly, we analyze the point cloud registration suitability with a pre-match, and present a hierarchical matching strategy with multiple geometric constraints in Euclidean Space to match two scans. Finally, we construct a multi-state loop evaluation model for a multi-level structure to determine whether the two candidate scans are a loop. In fact, our method also provides a transformation for point cloud registration with “Simplified Structure” when a loop is detected successfully. Experiments are carried out on three types of indoor environment. A 16 beam Lidar is used to collect data. The experimental results demonstrate that our method can detect global loop closures efficiently and accurately. The average global LCD precision, accuracy and negative are approximately 0.90, 0.96, and 0.97, respectively.
“…Thus, planar structure recognition, which can be formulated as the plane detection problem, has become an important research topic in computer vision for decades. The detected planes, which can be regarded as the abstracted form of an actual scene, contain a lot of high-level structure information and they can benefit many other semantic analysis tasks, like object detection [ 1 ], self-navigation [ 2 ], scene segmentation [ 3 ], SLAM [ 4 , 5 ], robot self-localization [ 6 , 7 , 8 ], For instance, the robot can better map the current environment with the plane detection result, which significantly reduces the uncertainty in the mapping results and improves the accuracy of positioning.…”
Real-time consistent plane detection (RCPD) from structured point cloud sequences facilitates various high-level computer vision and robotic tasks. However, it remains a challenge. Existing techniques for plane detection suffer from a long running time or the problem that the plane detection result is not precise. Meanwhile, labels of planes are not consistent over the whole image sequence due to plane loss in the detection stage. In order to resolve these issues, we propose a novel superpixel-based real-time plane detection approach, while keeping their consistencies over frames simultaneously. In summary, our method has the following key contributions: (i) a real-time plane detection algorithm to extract planes from raw structured three-dimensional (3D) point clouds collected by depth sensors; (ii) a superpixel-based segmentation method to make the detected plane exactly match its actual boundary; and, (iii) a robust strategy to recover the missing planes by utilizing the contextual correspondences information in adjacent frames. Extensive visual and numerical experiments demonstrate that our method outperforms state-of-the-art methods in terms of efficiency and accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.