Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera

Ma, Fangchang; Cavalheiro, Guilherme Venturelli; Karaman, Sertaç

doi:10.1109/icra.2019.8793637

Cited by 406 publications

(602 citation statements)

References 42 publications

Supporting

Mentioning

576

Contrasting

Order By: Relevance

“…However, the assistance from other modalities, e.g., color images, can significantly improve the completion accuracy. Ma et al concatenated the sparse depth and color image as the inputs of an off-the-shelf network [26] and further explored the feasibility of self-supervised Li-DAR completion [23]. Moreover, [14,16,33,4] proposed different network architectures to better exploit the potential of the encoder-decoder framework.…”

Section: Related Workmentioning

confidence: 99%

“…With the advances of deep learning methods, many depth completion approaches based on convolutional neural networks (CNNs) have been proposed. The mainstream of these methods is to directly input the sparse depth maps (with/without color images) into an encoder-decoder network and predict dense depth maps [26,16,36,15,10,23,2]. These black-box methods force the CNN to learn a mapping from sparse depth measurements to dense maps, which is generally a challenging task and leads to unsatisfactory completion results, as shown in Fig.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints

Zhu

Shi

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

226

142

View full text Add to dashboard Cite

Depth completion aims to recover dense depth maps from sparse depth measurements. It is of increasing importance for autonomous driving and draws increasing attention from the vision community. Most of existing methods directly train a network to learn a mapping from sparse depth inputs to dense depth maps, which has difficulties in utilizing the 3D geometric constraints and handling the practical sensor noises. In this paper, to regularize the depth completion and improve the robustness against noise, we propose a unified CNN framework that 1) models the geometric constraints between depth and surface normal in a diffusion module and 2) predicts the confidence of sparse Li-DAR measurements to mitigate the impact of noise. Specifically, our encoder-decoder backbone predicts surface normals, coarse depth and confidence of LiDAR inputs simultaneously, which are subsequently inputted into our diffusion refinement module to obtain the final completion results. Extensive experiments on KITTI depth completion dataset and NYU-Depth-V2 dataset demonstrate that our method achieves state-of-the-art performance. Further ablation study and analysis give more insights into the proposed method and demonstrate the generalization capability and stability of our model. AbstractSorry about this little trick and hope it would work, since I do not have much time to neatten my original source code. depth completion aims to recover dense depth maps from sparse depth measurements. It is of increasing importance for autonomous driving and draws increasing attention from the vision community. Most of existing methods directly train a network to learn a mapping from sparse depth inputs to dense depth maps, which has difficulties in utilizing the 3D geometric constraints and handling the practical sensor noises. In this paper, to regularize the depth completion and improve the robustness against noise, we propose a unified CNN framework that 1) models the geometric constraints between depth and surface normal in a

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints

Zhu

Shi

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

226

142

View full text Add to dashboard Cite

show abstract

“…The predicted distance to a stop sign had a standard deviation of 1.7 m and the predicted distance to a traffic light had a standard deviation of 5.9 m. Given the scope of this study, these error values were considered acceptable. However, to more accurately predict these distance values, a more sophisticated technique could be used, such as incorporating a stereo camera or using a monocular depth estimation approach . For this study, the predicted distance to a stop sign was considered an approximate distance to the start of the intersection, and the predicted distance to a traffic light was considered an approximate distance to the end of the intersection.…”

Section: System Architecturementioning

confidence: 99%

“…However, to more accurately predict these distance values, a more sophisticated technique could be used, such as incorporating a stereo camera or using a monocular depth estimation approach. 34 For this study, the predicted distance to a stop sign was considered an approximate distance to the start of the intersection, and the predicted distance to a traffic light was considered an approximate distance to the end of the intersection. For measurements that returned multiple detections, the mean predicted distance was used.…”

Section: Intersection Estimatormentioning

confidence: 99%

Estimation and navigation methods with limited information for autonomous urban driving

Chipka

Campbell

2019

Engineering Reports

View full text Add to dashboard Cite

Autonomous driving in dense urban areas presents an especially difficult task. First, globally localizing information, such as GPS signal, often proves to be unreliable in such areas due to signal shadowing and multipath errors. Second, the high‐definition environmental maps with sufficient information for autonomous navigation require a large amount of data to be collected from these areas, significant postprocessing of this data to generate the map, and then continual maintenance of the map to account for changes in the environment. This paper addresses the issue of autonomous driving in urban environments by investigating algorithms and an architecture to enable fully functional autonomous driving with little to no reliance on map‐based measurements or GPS signals. An extended Kalman filter with odometry, compass, and sparse landmark measurements as inputs is used to provide localization. Real‐time detection and estimation of key roadway features are used to create an understanding of the surrounding static scene. Navigation is accomplished by a compass‐based navigation control law. Experimental scene understanding results are obtained using computer vision and estimation techniques and demonstrate the ability to probabilistically infer key features of an intersection in real time. Key results from Monte Carlo studies demonstrate the proposed localization and navigation methods. These tests provide success rates of urban navigation under different environmental conditions, such as landmark density, and show that the vehicle can navigate to a goal nearly 10 km away without any external pose update at all. Field tests validate these simulated results and demonstrate that, for given test conditions, an expected range can be determined for a given success rate.

show abstract

“…Interpolation techniques have been widely used in lots of computer vision and robotics tasks, which can be classified into two categories, i.e., temporal interpolation [1], [8], [14] and spatial interpolation [10], [12], [26]. In video processing, video interpolation aims to temporally generate an intermediate frame using two consecutive frames.…”

Section: Introductionmentioning

confidence: 99%

PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation

Liu

Liao

Lin

et al. 2020

Sensors

View full text Add to dashboard Cite

LiDAR sensors can provide dependable 3D spatial information at a low frequency (around 10Hz) and have been widely applied in the field of autonomous driving and UAV. However, the camera with a higher frequency (around 20Hz) has to be decreased so as to match with LiDAR in a multisensor system. In this paper, we propose a novel Pseudo-LiDAR interpolation network (PLIN) to increase the frequency of Li-DAR sensors. PLIN can generate temporally and spatially highquality point cloud sequences to match the high frequency of cameras. To achieve this goal, we design a coarse interpolation stage guided by consecutive sparse depth maps and motion relationship. We also propose a refined interpolation stage guided by the realistic scene. Using this coarse-to-fine cascade structure, our method can progressively perceive multi-modal information and generate accurate intermediate point clouds.To the best of our knowledge, this is the first deep framework for Pseudo-LiDAR point cloud interpolation, which shows appealing applications in navigation systems equipped with LiDAR and cameras. Experimental results demonstrate that PLIN achieves promising performance on the KITTI dataset, significantly outperforming the traditional interpolation method and the state-of-the-art video interpolation technique.

show abstract

Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera

Cited by 406 publications

References 42 publications

Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints

Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints

Estimation and navigation methods with limited information for autonomous urban driving

PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation

Contact Info

Product

Resources

About