Abstract-This paper presents a vision-based approach to SLAM in large-scale environments with minimal sensing and computational requirements. The approach is based on a graphical representation of robot poses and links between the poses. Links between the robot poses are established based on odomety and image similarity, then a relaxation algorithm is used to generate a globally consistent map. To estimate the covariance matrix for links obtained from the vision sensor, a novel method is introduced based on the relative similarity of neighbouring images, without requiring distances to image features or multiple view geometry. Indoor and outdoor experiments demonstrate that the approach scales well to large-scale environments, producing topologically correct and geometrically accurate maps at minimal computational cost. Mini-SLAM was found to produce consistent maps in an unstructured, large-scale environment (the total path length was 1.4 km) containing indoor and outdoor passages.
I. INTRODUCTIONThis paper presents a new approach to the problem of simultaneous localization and mapping (SLAM). The suggested method is called "Mini-SLAM" since it is minimalistic in several ways. On the hardware side, it solely relies on odometry and an omnidirectional camera. Using a camera as the external source of information for the SLAM algorithm provides a much cheaper solution compared to state-of-theart 2D and 3D laser scanners with a typically even longer range. It is further known that vision can enable solutions in highly cluttered environments where laser range scanner based SLAM algorithms fail [1]. In this paper, the readings from a laser scanner are used to demonstrate the consistency of the created maps. Please note, however, that the laser scanner is only utilised for visualisation and is not used in the visual SLAM algorithm.Apart from the frugal hardware requirements, the method is also minimalistic in its computational demands. Map estimation is performed online by a linear time SLAM algorithm that operates on an efficient graph representation. The main difference to other vision-based SLAM approaches is that there is no estimate of the positions of a set of landmarks involved, enabling the algorithm to scale up better with the size of the environment. Instead, a measure of image similarity is used to estimate the pose difference in between corresponding images and the uncertainty of this estimate. Given these "visual relations" and "odometry relations"