Initial position estimation in global maps, which is a prerequisite for accurate localization, plays a critical role in mobile robot navigation tasks. Global positioning system signals often become unreliable in disaster sites or indoor areas, which require other localization methods to help the robot in searching and rescuing. Many visual-based approaches focus on estimating a robot's position within prior maps acquired with cameras. In contrast to conventional methods that need a coarse estimation of initial position to precisely localize a camera in a given map, we propose a novel approach that estimates the initial position of a monocular camera within a given 3D light detection and ranging map using a convolutional neural network with no retraining is required. It enables a mobile robot to estimate a coarse position of itself in 3D maps with only a monocular camera. The key idea of our work is to use depth information as intermediate data to retrieve a camera image in immense point clouds. We employ an unsupervised learning framework to predict the depth from a single image. Then we use a pretrained convolutional neural network model to generate depth image descriptors to construct representations of the places. We retrieve the position by computing similarity scores between the current depth image and the depth images projected from the 3D maps. Experiments on the publicly available KITTI data sets have demonstrated the efficiency and feasibility of the presented algorithm.
Localization information is essential for mobile robot systems in navigation tasks. Many visual‐based approaches focus on localizing a robot within prior maps acquired with cameras. It is critical where the Global Positioning System signal is unreliable. In contrast to conventional methods that localize a camera in an image‐based map, we propose a novel approach that localizes a monocular camera within a given three‐dimensional (3D) light detection and ranging (LiDAR) map. We employ visual odometry to reconstruct a semidense set of 3D points from the monocular camera images. These points are continuously matched against the 3D prior LiDAR map by a modified feature‐based point cloud registration method to track a full six‐degree‐of‐freedom camera pose. Since the monocular camera suffers from the scale‐drift problem due to the lack of depth information, the proposed method solves it by adopting updatable scale estimation. Experiments carried out on a publicly large‐scale data set demonstrate that the camera and LiDAR multimodal data matching problem is solved, and the localization accuracy of our method is comparable to state‐of‐the‐art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.