Abstract:Orientation estimation is a crucial part of robotics tasks such as motion control, autonomous navigation, and 3D mapping. In this paper, we propose a robust visual-based method to estimate robots’ drift-free orientation with RGB-D cameras. First, we detect and track hybrid features (i.e., plane, line, and point) from color and depth images, which provides reliable constraints even in uncharacteristic environments with low texture or no consistent lines. Then, we construct a cost function based on these feature… Show more
“…We extract the MW axes from the first frame by utilizing the plane normal vectors and the parallel lines’ vanishing directions (VDs), the details of which are given in our previous work [32]. To extract the accurate plane normals, we use the normal vectors obtained by the previous fast plane extraction method as the initial value and then perform the mean shift algorithm in the tangent plane of the unit sphere to get the final plane normal vectors, as shown in Figure 4.…”
Pose estimation and map reconstruction are basic requirements for robotic autonomous behavior. In this paper, we propose a point–plane-based method to simultaneously estimate the robot’s poses and reconstruct the current environment’s map using RGB-D cameras. First, we detect and track the point and plane features from color and depth images, and reliable constraints are obtained, even for low-texture scenes. Then, we construct cost functions from these features, and we utilize the plane’s minimal representation to minimize these functions for pose estimation and local map optimization. Furthermore, we extract the Manhattan World (MW) axes on the basis of the plane normals and vanishing directions of parallel lines for the MW scenes, and we add the MW constraint to the point–plane-based cost functions for more accurate pose estimation. The results of experiments on public RGB-D datasets demonstrate the robustness and accuracy of the proposed algorithm for pose estimation and map reconstruction, and we show its advantages compared with alternative methods.
“…We extract the MW axes from the first frame by utilizing the plane normal vectors and the parallel lines’ vanishing directions (VDs), the details of which are given in our previous work [32]. To extract the accurate plane normals, we use the normal vectors obtained by the previous fast plane extraction method as the initial value and then perform the mean shift algorithm in the tangent plane of the unit sphere to get the final plane normal vectors, as shown in Figure 4.…”
Pose estimation and map reconstruction are basic requirements for robotic autonomous behavior. In this paper, we propose a point–plane-based method to simultaneously estimate the robot’s poses and reconstruct the current environment’s map using RGB-D cameras. First, we detect and track the point and plane features from color and depth images, and reliable constraints are obtained, even for low-texture scenes. Then, we construct cost functions from these features, and we utilize the plane’s minimal representation to minimize these functions for pose estimation and local map optimization. Furthermore, we extract the Manhattan World (MW) axes on the basis of the plane normals and vanishing directions of parallel lines for the MW scenes, and we add the MW constraint to the point–plane-based cost functions for more accurate pose estimation. The results of experiments on public RGB-D datasets demonstrate the robustness and accuracy of the proposed algorithm for pose estimation and map reconstruction, and we show its advantages compared with alternative methods.
“…The accuracy of rotation motion estimation is improved using the density distribution of direction vectors and surface normal vectors. Guo et al [28] use the cost function composed of point, line, and plane features to estimate the rotation during tracing. The keyframe rotation is refined by aligning the currently extracted MW axes with the global MW axes, while Li et al [25] conduct the surface normal prediction of the RGB image by the convolutional neural network (CNN) to replace the role of the depth camera.…”
Section: Structural Regularitymentioning
confidence: 99%
“…Straight lines corresponding to each dominant direction in the space are no longer parallel in the image after projective transformation, but intersect at the vanishing point (VP) [21]. Some previous works use the structural regularity of the MW on monocular [22][23][24][25], stereo [26] and RGB-D cameras [27,28], respectively, essentially using the orthogonality of vanishing points to calculate accurate rotation or constrain the relative rotation between frames. From these works, it can be seen that the structural feature can eliminate the accumulative rotation drift of the system.…”
Based on the hypothesis of the Manhattan world, we propose a tightly-coupled monocular visual-inertial odometry (VIO) system that combines structural features with point features and can run on a mobile phone in real-time. The back-end optimization is based on the sliding window method to improve computing efficiency. As the Manhattan world is abundant in the man-made environment, this regular world can use structural features to encode the orthogonality and parallelism concealed in the building to eliminate the accumulated rotation error. We define a structural feature as an orthogonal basis composed of three orthogonal vanishing points in the Manhattan world. Meanwhile, to extract structural features in real-time on the mobile phone, we propose a fast structural feature extraction method based on the known vertical dominant direction. Our experiments on the public datasets and self-collected dataset show that our system is superior to most existing open-source systems, especially in the situations where the images are texture-less, dark, and blurry.
“…Recently, vision-based simultaneous localization and mapping (V-SLAM) techniques have become more and more popular due to the need for the autonomous navigation of mobile robots [1,2]. The front-end feature point detection and feature matching are especially important because their accuracy will significantly influence the performance of back-end visual odometry, mapping, and pose estimation [3,4]. In the front-end schemes, although speed-up robust features (SURF) exhibit a faster operational speed, its accuracy is worse than scale-invariant feature transform (SIFT) [5,6].…”
In this paper, we propose an FPGA-based enhanced-SIFT with feature matching for stereo vision. Gaussian blur and difference of Gaussian pyramids are realized in parallel to accelerate the processing time required for multiple convolutions. As for the feature descriptor, a simple triangular identification approach with a look-up table is proposed to efficiently determine the direction and gradient of the feature points. Thus, the dimension of the feature descriptor in this paper is reduced by half compared to conventional approaches. As far as feature detection is concerned, the condition for high-contrast detection is simplified by moderately changing a threshold value, which also benefits the reduction of the resulting hardware in realization. The proposed enhanced-SIFT not only accelerates the operational speed but also reduces the hardware cost. The experiment results show that the proposed enhanced-SIFT reaches a frame rate of 205 fps for 640 × 480 images. Integrated with two enhanced-SIFT, a finite-area parallel checking is also proposed without the aid of external memory to improve the efficiency of feature matching. The resulting frame rate by the proposed stereo vision matching can be as high as 181 fps with good matching accuracy as demonstrated in the experimental results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.