It is essential for a robot to be able to detect revisits or loop closures for long-term visual navigation.A key insight explored in this work is that the loop-closing event inherently occurs sparsely, i.e., the image currently being taken matches with only a small subset (if any) of previous images. Based on this observation, we formulate the problem of loop-closure detection as a sparse, convex 1 -minimization problem. By leveraging fast convex optimization techniques, we are able to efficiently find loop closures, thus enabling real-time robot navigation. This novel formulation requires no offline dictionary learning, as required by most existing approaches, and thus allows online incremental operation. Our approach ensures a unique hypothesis by choosing only a single globally optimal match when making a loop-closure decision. Furthermore, the proposed formulation enjoys a flexible representation with no restriction imposed on how images should be represented, while requiring only that the representations are "close" to each other when the corresponding images are visually similar. The proposed algorithm is validated extensively using real-world datasets.
IntroductionWith a growing demand for autonomous robots in a range of applications, such as search and rescue [42,8], and space and underwater exploration [7], it is essential for the robots to be able to navigate accurately for an extended period of time in order to accomplish the assigned tasks. To this end, the ability to detect revisits (i.e., loop closure or place recognition) becomes necessary, since it allows the robots to bound the errors and uncertainty in the estimates of their positions and orientations (poses). In this work, we particularly focus on 1 Corresponding author, loop closure during visual navigation, i.e., given a camera stream we aim to efficiently determine whether the robot has previously seen the current place or not.Even though the problem of loop closure has been extensively studied in the visual-SLAM literature (e.g., see [30,13,21]), a vast majority of existing algorithms typically require the offline training of visual words (dictionary) from a priori images that are acquired previously in visually similar environments. Clearly, this is not always the case when a robot operates in an unknown, drastically different environment. In general, it is difficult to reliably find loops in (visual) appearance space. One particular challenge is the perceptual aliasing -that is, while images may be similar in appearance, they might be coming from different places.To mitigate this issue, both temporal (i.e., loops will only be considered closed if there are other loops closed nearby) and geometric constraints (i.e., if a loop has to be considered closed, a valid transformation must exist between the matched images) can be employed [21]. It is important to point out that the approach of [21] decides on the quality of a match locally -If the match with the highest score (in some distance measure) is away from the second highest, it is consid...