Robust and efficient visual localization is essential for numerous robotic applications. However, it remains a challenging problem especially when significant environmental or perspective changes present, as there are high percentage of outliers, i.e. incorrect feature matches, between the query image and the map. In this paper, we propose a novel 2-entity RANSAC framework using 3D-2D point and line feature matches for visual localization with the aid of inertial measurements and derive minimal closed form solutions using only 1 point 1 line or 2 point matches for both monocular and multi-camera system. The proposed 2-entity RANSAC can achieve higher robustness against outliers as multiple types of features are utilized and the number of matches needed to compute a pose is reduced. Furthermore, we propose a learning-based sampling strategy selection mechanism and a feature scoring network to be adaptive to different environmental characteristics such as structured and unstructured. Finally, both simulation and real-world experiments are performed to validate the robustness and effectiveness of the proposed method in scenarios with long-term and perspective changes 1 . Index Terms-Camera pose estimation, random sample consensus (RANSAC), robust localization.
I. INTRODUCTIONL OCALIZATION is a fundamental capability for mobile robots with applications to driverless cars, unmanned aerial vehicles and so on [1] [2]. Visual localization attracts more attention as the cameras are low-cost, lightweight and versatile compared with Light Detection and Ranging (Li-DAR). The general idea of visual localization is to recover the translation and rotation of the query camera based on feature matches using various descriptors (e.g. FAST [3], SIFT