2021
DOI: 10.1016/j.robot.2020.103701
|View full text |Cite
|
Sign up to set email alerts
|

On deep learning techniques to boost monocular depth estimation for autonomous navigation

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(10 citation statements)
references
References 164 publications
(255 reference statements)
0
10
0
Order By: Relevance
“…Although photogrammetry and multi-view stereo have a long history in the field of computer vision, height or depth from a single-view image (excluding traditional photoclinometry [13][14][15]) only started being considered feasible in recent years, alongside the great success of the development of deep learning techniques, wherein the term for single-image DTM/height/depth estimation is generally referred to as "monocular depth estimation" (MDE). With a variety of potential applications in the fields of robotics, autonomous driving, virtual reality, and etc., several hundreds of MDE methods/networks have been proposed [16][17][18][19] over the last 7 years. In terms of training mechanisms, these MDE methods can be classified into supervised methods, requiring ground truth depth image for training, and unsupervised methods, using 3D geometry and requiring only multi-view images as inputs.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Although photogrammetry and multi-view stereo have a long history in the field of computer vision, height or depth from a single-view image (excluding traditional photoclinometry [13][14][15]) only started being considered feasible in recent years, alongside the great success of the development of deep learning techniques, wherein the term for single-image DTM/height/depth estimation is generally referred to as "monocular depth estimation" (MDE). With a variety of potential applications in the fields of robotics, autonomous driving, virtual reality, and etc., several hundreds of MDE methods/networks have been proposed [16][17][18][19] over the last 7 years. In terms of training mechanisms, these MDE methods can be classified into supervised methods, requiring ground truth depth image for training, and unsupervised methods, using 3D geometry and requiring only multi-view images as inputs.…”
Section: Previous Workmentioning
confidence: 99%
“…In addition to the above, some other work focused on the topic of "depth completion" [19], which is different but directly relevant to MDE, aiming at improving the quality of the MDE results. These works generally refer to MDE denoising [55,56] and MDE refinement [57,58].…”
Section: Previous Workmentioning
confidence: 99%
“…MDE can be based on either LiDAR supervision, or stereo/SfM self-supervision, or combinations; where, both LiDAR and stereo data, and SfM computations, are only required at training time, but not at testing time. We refer to [ 6 ] for a review on MDE state-of-the-art. In this paper, to isolate the multi-modal co-training performance assessment as much as possible from the MDE performance, we have chosen the top-performing supervised method proposed by Yin et al [ 18 ].…”
Section: Related Workmentioning
confidence: 99%
“…Supervised deep learning enables accurate computer vision models. Key for this success is the access to raw sensor data (i.e., images) with ground truth (GT) for the visual task at hand (e.g., image classification [ 1 ], object detection [ 2 ] and recognition [ 3 ], pixel-wise instance/semantic segmentation [ 4 , 5 ], monocular depth estimation [ 6 ], 3D reconstruction [ 7 ], etc.). The supervised training of such computer vision models, which are based on convolutional neural networks (CNNs), is known to required very large amounts of images with GT [ 8 ].…”
Section: Introductionmentioning
confidence: 99%
“…Supervised deep learning is enabling accurate computer vision models. Key for this success is the access to raw sensor data (i.e., images) with ground truth (GT) for the visual task at hand (e.g., image classification [22], object detection [18] and recognition [25], pixel-wise instance/semantic segmentation [29,31], monocular depth estimation [3], 3D reconstruction [13], etc). The supervised training of such computer vision models, which are based on convolutional neural networks (CNNs), is known to required very large amounts of images with GT [23].…”
Section: Introductionmentioning
confidence: 99%