We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion method that preserves high confidence stereo estimates, while leveraging the CNN estimates in the low-confidence regions. The main contribution of the article is that it is shown that the fused estimates lead to a higher performance than the stereo vision estimates alone. Experiments are performed on the KITTI dataset, and on board of a Parrot SLAMDunk, showing that even rather limited CNNs can help provide stereo vision equipped robots with more reliable depth maps for autonomous navigation.
Self-Supervised Learning (SSL) is a reliable learning mechanism in which a robot uses an original, trusted sensor cue for training to recognize an additional, complementary sensor cue. We study for the first time in SSL how a robot's learning behavior should be organized, so that the robot can keep performing its task in the case that the original cue becomes unavailable. We study this persistent form of SSL in the context of a flying robot that has to avoid obstacles based on distance estimates from the visual cue of stereo vision. Over time it will learn to also estimate distances based on monocular appearance cues. A strategy is introduced that has the robot switch from stereo vision based flight to monocular flight, with stereo vision purely used as 'training wheels' to avoid imminent collisions. This strategy is shown to be an effective approach to the 'feedbackinduced data bias' problem as also experienced in learning from demonstration. Both simulations and real-world experiments with a stereo vision equipped AR drone 2.0 show the feasibility of this approach, with the robot successfully using monocular vision to avoid obstacles in a 5 × 5m room. The experiments show the potential of persistent SSL as a robust learning approach to enhance the capabilities of robots. Moreover, the abundant training data coming from the own sensors allows to gather large data sets necessary for deep learning approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.