Abstract-The use of a multi-camera system enables a robot to obtain a surround view, and thus, maximize its perceptual awareness of its environment. An accurate calibration is a necessary prerequisite if vision-based simultaneous localization and mapping (vSLAM) is expected to provide reliable pose estimates for a micro aerial vehicle (MAV) with a multi-camera system. On our MAV, we set up each camera pair in a stereo configuration. We propose a novel vSLAM-based self-calibration method for a multi-sensor system that includes multiple calibrated stereo cameras and an inertial measurement unit (IMU). Our selfcalibration estimates the transform with metric scale between each camera and the IMU. Once the MAV is calibrated, the MAV is able to estimate its global pose via a multi-camera vSLAM implementation based on the generalized camera model. We propose a novel minimal and linear 3-point algorithm that uses inertial information to recover the relative motion of the MAV with metric scale. Our constant-time vSLAM implementation with loop closures runs on-board the MAV in real-time. To the best of our knowledge, no published work has demonstrated realtime on-board vSLAM with loop closures. We show experimental results in both indoor and outdoor environments. The code for both the self-calibration and vSLAM is available as a set of ROS packages at https://github.com/hengli/vmav-ros-pkg.
I. INTRODUCTIONVision-based MAVs are more versatile than laser-based MAVs. Whereas a laser only provides geometry data, a camera can provide both geometry data via stereo and structure-frommotion techniques, and appearance data. A camera is a passive sensor while a laser is an active sensor and is thus susceptible to interference. Furthermore, a camera is lighter and has a smaller footprint. However, utmost care has to be taken when choosing the camera configuration for a vision-based MAV expected to operate robustly in challenging environments. A single-camera configuration introduces limited perceptual awareness, and in turn, flight constraints because if the camera observes too few features for some time, localization can fail and lead to a crash. In the case of a single downward-looking camera [32], the MAV cannot fly too close to the ground which often has little texture, and at the same time, it cannot perform obstacle avoidance due to the absence of a forward-looking camera. In the case of a forward-looking camera, constraints are imposed on the path planning. For example, in [24], any planned path ensures that there are a sufficient number of features for localization. In addition, in [11], a path is planned such that the MAV does not move outside the camera's field of view, and inadvertently crash into an unseen obstacle.In this paper, we use a multi-camera system on a MAV; together with the use of fish-eye lenses, this camera configuration provides a surround view of the vicinity. Our multi-camera