To address the problem of estimating camera trajectory and to build a structural three-dimensional (3D) map based on inertial measurements and visual observations, this paper proposes point–line visual–inertial odometry (PL-VIO), a tightly-coupled monocular visual–inertial odometry system exploiting both point and line features. Compared with point features, lines provide significantly more geometrical structure information on the environment. To obtain both computation simplicity and representational compactness of a 3D spatial line, Plücker coordinates and orthonormal representation for the line are employed. To tightly and efficiently fuse the information from inertial measurement units (IMUs) and visual sensors, we optimize the states by minimizing a cost function which combines the pre-integrated IMU error term together with the point and line re-projection error terms in a sliding window optimization framework. The experiments evaluated on public datasets demonstrate that the PL-VIO method that combines point and line features outperforms several state-of-the-art VIO systems which use point features only.
The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M 3 VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M 3 VSNet establishes the state-of-the-arts unsupervised method and achieves better performance than previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks & Temples benchmark with effective improvement.
This paper proposes a novel deep convolutional model, Tri-Points Based Line Segment Detector (TP-LSD), to detect line segments in an image at real-time speed. The previous related methods typically use the two-step strategy, relying on either heuristic post-process or extra classifier. To realize one-step detection with a faster and more compact model, we introduce the tri-points representation, converting the line segment detection to the end-to-end prediction of a root-point and two endpoints for each line segment. TP-LSD has two branches: tri-points extraction branch and line segmentation branch. The former predicts the heat map of root-points and the two displacement maps of endpoints. The latter segments the pixels on straight lines out from background. Moreover, the line segmentation map is reused in the first branch as structural prior. We propose an additional novel evaluation metric and evaluate our method on Wireframe and YorkUrban datasets, demonstrating not only the competitive accuracy compared to the most recent methods, but also the real-time run speed up to 78 FPS with the 320 × 320 input.
Leveraging line features to improve location accuracy of point-based visual-inertial SLAM (VINS) is gaining importance as they provide additional constraint of scene structure regularity, however, real-time performance has not been focused. This paper presents PL-VINS, a real-time optimizationbased monocular VINS method with point and line, developed based on state-of-the-art point-based VINS-Mono [1]. Observe that current works use LSD [2] algorithm to extract lines, however, the LSD is designed for scene shape representation instead of specific pose estimation problem, which becomes the bottleneck for the real-time performance due to its expensive cost. In this work, a modified LSD algorithm is presented by studying hidden parameter tuning and length rejection strategy. The modified LSD can run three times at least as fast as the LSD. Further, by representing a line landmark with Pl ücker coordinate, the line reprojection residual is modeled as midpointto-line distance then minimized by iteratively updating the minimum four-parameter orthonormal representation of the Pl ücker coordinate. Experiments in public EuRoc benchmark dataset show the location error of our method is down 12-16% compared to VINS-Mono at the same work frequency on a low-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.