Abstract. The ability to localise a camera moving in a previously unknown environment is desirable for a wide range of applications. In computer vision this problem is studied as monocular SLAM. Recent years have seen improvements to the usability and scalability of monocular SLAM systems to the point that they may soon find uses outside of laboratory conditions. However, the robustness of these systems to rapid camera motions (we refer to this quality as agility) still lags behind that of tracking systems which use known object models. In this paper we attempt to remedy this. We present two approaches to improving the agility of a keyframe-based SLAM system: Firstly, we add edge features to the map and exploit their resilience to motion blur to improve tracking under fast motion. Secondly, we implement a very simple inter-frame rotation estimator to aid tracking when the camera is rapidly panning -and demonstrate that this method also enables a trivially simple yet effective relocalisation method. Results show that a SLAM system combining points, edge features and motion initialisation allows highly agile tracking at a moderate increase in processing time.
We show how a system for video-rate parallel camera tracking and 3D map-building can be readily extended to allow one or more cameras to work in several maps, separately or simultaneously. The ability to handle several thousand features per map at video-rate, and for the cameras to switch automatically between maps, allows spatially localized AR workcells to be constructed and used with very little intervention from the user of a wearable vision system. The user can explore an environment in a natural way, acquiring local maps in real-time. When revisiting those areas the camera will select the correct local map from store and continue tracking and structural acquisition, while the user views relevant AR constructs registered to that map.
This paper demonstrates a real-time, full-3D edge tracker based on a particle filter. In contrast to previous methods this system is capable of tracking complex self-occluding three-dimensional structures. The system exploits graphics hardware in a novel manner, allowing it not only to perform hidden line removal for each particle but also to evaluate pose likelihoods directly on the graphics card. This approach allows video-rate filtering with hundreds of particles on a standard workstation.
Monocular SLAM has the potential to turn inexpensive cameras into powerful pose sensors for applications such as robotics and augmented reality. We present a relocalization module for such systems which solves some of the problems encountered by previous monocular SLAM systems--tracking failure, map merging, and loop closure detection. This module extends recent advances in keypoint recognition to determine the camera pose relative to the landmarks within a single frame time of 33 ms. We first show how this module can be used to improve the robustness of these systems. Blur, sudden motion, and occlusion can all cause tracking to fail, leading to a corrupted map. Using the relocalization module, the system can automatically detect and recover from tracking failure while preserving map integrity. Extensive tests show that the system can then reliably generate maps for long sequences even in the presence of frequent tracking failure. We then show that the relocalization module can be used to recognize overlap in maps, i.e., when the camera has returned to a previously mapped area. Having established an overlap, we determine the relative pose of the maps using trajectory alignment so that independent maps can be merged and loop closure events can be recognized. The system combining all of these abilities is able to map larger environments and for significantly longer periods than previous systems.
This paper presents novel methods for increasing the robustness of visual tracking systems by incorporating information from inertial sensors. We show that more can be achieved than simply combining the sensor data within a statistical filter. In particular we show how, in addition to using inertial data to provide predictions for the visual sensor, this data can also be used to provide an estimate of motion blur for each feature and this can be used to dynamically tune the parameters of each feature detector in the visual sensor. This allows the system to obtain useful information from the visual sensor even in the presence of substantial motion blur. Finally, the visual sensor can be used to calibrate the parameters of the inertial sensor to eliminate drift.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.