Depth completion, the technique of estimating a dense depth image from sparse depth measurements, has a variety of applications in robotics and autonomous driving. However, depth completion faces 3 main challenges: the irregularly spaced pattern in the sparse depth input, the difficulty in handling multiple sensor modalities (when color images are available), as well as the lack of dense, pixel-level ground truth depth labels. In this work, we address all these challenges. Specifically, we develop a deep regression model to learn a direct mapping from sparse depth (and color images) to dense depth. We also propose a self-supervised training framework that requires only sequences of color and sparse depth images, without the need for dense depth labels. Our experiments demonstrate that our network, when trained with semi-dense annotations, attains state-of-theart accuracy and is the winning approach on the KITTI depth completion benchmark 2 at the time of submission. Furthermore, the self-supervised framework outperforms a number of existing solutions trained with semidense annotations.
Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors' knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.1 This throughput is achieved with a batch size of one and 32-bit floating point precision. Throughput can be increased by using a larger batch size (at the cost of higher latency), and/or reducing bitwidths through quantization.2 Accuracy metrics are defined in Section IV-A.
We address the following question: is it possible to reconstruct the geometry of an unknown environment using sparse and incomplete depth measurements? This problem is relevant for a resource-constrained robot that has to navigate and map an environment, but does not have enough on-board power or payload to carry a traditional depth sensor (e.g., a 3D lidar) and can only acquire few (point-wise) depth measurements. In general, reconstruction from incomplete data is not possible, but when the robot operates in man-made environments, the depth exhibits some regularity (e.g., many planar surfaces with few edges); we leverage this regularity to infer depth from incomplete measurements. Our formulation bridges robotic perception with the compressive sensing literature in signal processing. We exploit this connection to provide formal results on exact depth recovery in 2D and 3D problems. Taking advantage of our specific sensing modality, we also prove novel and more powerful results to completely characterize the geometry of the signals that we can reconstruct. Our results directly translate to practical algorithms for depth reconstruction; these algorithms are simple (they reduce to solving a linear program), and robust to noise. We test our algorithms on real and simulated data, and show that they enable accurate depth reconstruction from a handful of measurements, and perform well even when the assumption of structured environment is violated.
We consider the case in which a robot has to navigate in an unknown environment but does not have enough on-board power or payload to carry a traditional depth sensor (e.g., a 3D lidar) and thus can only acquire a few (point-wise) depth measurements. We address the following question: is it possible to reconstruct the geometry of an unknown environment using sparse and incomplete depth measurements? Reconstruction from incomplete data is not possible in general, but when the robot operates in man-made environments, the depth exhibits some regularity (e.g., many planar surfaces with only a few edges); we leverage this regularity to infer depth from a small number of measurements. Our first contribution is a formulation of the depth reconstruction problem that bridges robot perception with the compressive sensing literature in signal processing. The second contribution includes a set of formal results that ascertain the exactness and stability of the depth reconstruction in 2D and 3D problems, and completely characterize the geometry of the profiles that we can reconstruct. Our third contribution is a set of practical algorithms for depth reconstruction: our formulation directly translates into algorithms for depth estimation based on convex programming. In real-world problems, these convex programs are very large and general-purpose solvers are relatively slow. For this reason, we discuss ad-hoc solvers that enable fast depth reconstruction in real problems. The last contribution is an extensive experimental evaluation in 2D and 3D problems, including Monte Carlo runs on simulated instances and testing on multiple real datasets. Empirical results confirm that the proposed approach ensures accurate depth reconstruction, outperforms interpolation-based strategies, and performs well even when the assumption of structured environment is violated. SUPPLEMENTAL MATERIAL• Video demonstrations: https://youtu.be/vE56akCGeJQ• Source code: https://github.com/sparse-depth-sensing
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.