Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge

Tosi, Fabio; Aleotti, Filippo; Poggi, Matteo; Mattoccia, Stefano

doi:10.1109/cvpr.2019.01003

Cited by 202 publications

(176 citation statements)

References 59 publications

Supporting

Mentioning

173

Contrasting

Unclassified

Order By: Relevance

“…Similar to before, we also show results for our method without pretraining ("Ours HR Resnet50 w/o pretraining"). Our non-pretrained model beats SuperDepth [26] in six out of seven metrics (tied in one), and compares favourably to the concurrent work monoResMatch [30], which makes use of a significantly more complex network compared to our encoder-decoder architecture.…”

Section: Depth From Color Tournamentmentioning

confidence: 77%

“…The concurrent work monoResMatch by Tosi et al [30] also exploits proxy ground truth labels generated with a traditional stereo matching method [13]. The inclusion of the proxy supervision is shown to greatly improve accuracy over using a standard self-supervised loss.…”

Section: Additional Supervisionmentioning

confidence: 99%

“…Similarly, Zhu et al [43] add a supervised loss [1] to solve for optical flow and Kuznietsov et al [18] add a supervised loss for depth estimation from LiDAR. Concurrently proposed monoResMatch [30] uses this method to incorporate a proxy-supervised signal, albeit using a reverse Huber loss [19] as opposed to log L 1 . The addition of supervised losses change the objective function that is being minimized; one could view the additional term as a form of regularization, constraining the network prediction to adhere to the proposed depth values.…”

Section: Baseline Loss Functionsmentioning

confidence: 99%

See 2 more Smart Citations

Self-Supervised Monocular Depth Hints

Watson¹,

Firman²,

Brostow

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

217

178

View full text Add to dashboard Cite

Monocular depth estimators can be trained with various forms of self-supervision from binocular-stereo data to circumvent the need for high-quality laser scans or other ground-truth data. The disadvantage, however, is that the photometric reprojection losses used with selfsupervised learning typically have multiple local minima. These plausible-looking alternatives to ground truth can restrict what a regression network learns, causing it to predict depth maps of limited quality. As one prominent example, depth discontinuities around thin structures are often incorrectly estimated by current state-of-the-art methods.Here, we study the problem of ambiguous reprojections in depth prediction from stereo-based self-supervision, and introduce Depth Hints to alleviate their effects. Depth Hints are complementary depth suggestions obtained from simple off-the-shelf stereo algorithms. These hints enhance an existing photometric loss function, and are used to guide a network to learn better weights. They require no additional data, and are assumed to be right only sometimes. We show that using our Depth Hints gives a substantial boost when training several leading self-supervised-from-stereo models, not just our own. Further, combined with other good practices, we produce state-of-the-art depth predictions on the KITTI benchmark. We demonstrate that our selective training using DepthHints is a general enhancement that can improve multiple leading self-supervised training algorithms, allowing our implementations to reach better minima. The Depth Hints can come from the same stereo image data, via, e.g. OpenCV's stereo estimates [13,14].3. We show that our selective training with Depth Hints, coupled with sensible network design choices, leads us to outperform most other algorithms. We achieve state-of-the-art results on the KITTI dataset [8], outperforming both our baseline model and previously published results.

show abstract

Section: Depth From Color Tournamentmentioning

confidence: 77%

Section: Additional Supervisionmentioning

confidence: 99%

Section: Baseline Loss Functionsmentioning

confidence: 99%

See 1 more Smart Citation

Self-Supervised Monocular Depth Hints

Watson¹,

Firman²,

Brostow

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

217

178

View full text Add to dashboard Cite

show abstract

“…Among methods tested which are dedicated to monocular cameras, we can find approaches such as SfmLearner [28], MonoResMatch [29], Monodepth [14] and Monodepth2 [30]. Except [28], which is trained on monocular image sequences by learning the structure from motion elements using three frames snippet, all of these algorithms require a pair of images from a calibrated stereoscopic camera in order to be able to do the training.…”

Section: Algorithm For Monocular Cameramentioning

confidence: 99%

Deep Learning for Real-Time 3D Multi-Object Detection, Localisation, and Tracking: Application to Smart Mobility

Mauri

Khemmar

Decoux

et al. 2020

Sensors

View full text Add to dashboard Cite

In core computer vision tasks, we have witnessed significant advances in object detection, localisation and tracking. However, there are currently no methods to detect, localize and track objects in road environments, and taking into account real-time constraints. In this paper, our objective is to develop a deep learning multi object detection and tracking technique applied to road smart mobility. Firstly, we propose an effective detector-based on YOLOv3 which we adapt to our context. Subsequently, to localize successfully the detected objects, we put forward an adaptive method aiming to extract 3D information, i.e., depth maps. To do so, a comparative study is carried out taking into account two approaches: Monodepth2 for monocular vision and MADNEt for stereoscopic vision. These approaches are then evaluated over datasets containing depth information in order to discern the best solution that performs better in real-time conditions. Object tracking is necessary in order to mitigate the risks of collisions. Unlike traditional tracking approaches which require target initialization beforehand, our approach consists of using information from object detection and distance estimation to initialize targets and to track them later. Expressly, we propose here to improve SORT approach for 3D object tracking. We introduce an extended Kalman filter to better estimate the position of objects. Extensive experiments carried out on KITTI dataset prove that our proposal outperforms state-of-the-art approches.

show abstract

“…On the other hand, while existing works leverage such knowledge at training time only, we deploy a monocular VO algorithm to obtain geometrical priors to feed our network with. Being such priors sourced by a monocular setup, they are available at inference time in contrast to others available from stereo images [40,46] and thus available at training time only.…”

Section: Related Workmentioning

confidence: 99%

Enhancing Self-Supervised Monocular Depth Estimation with Traditional Visual Odometry

Andraghetti¹,

Myriokefalitakis²,

Dovesi³

et al. 2019

2019 International Conference on 3D Vision (3DV)

Self Cite

View full text Add to dashboard Cite

Estimating depth from a single image represents an attractive alternative to more traditional approaches leveraging multiple cameras. In this field, deep learning yielded outstanding results at the cost of needing large amounts of data labeled with precise depth measurements for training. An issue softened by self-supervised approaches leveraging monocular sequences or stereo pairs in place of expensive ground truth depth annotations. This paper enables to further improve monocular depth estimation by integrating into existing self-supervised networks a geometrical prior. Specifically, we propose a sparsity-invariant autoencoder able to process the output of conventional visual odometry algorithms working in synergy with depth-from-mono networks. Experimental results on the KITTI dataset show that by exploiting the geometrical prior, our proposal: i) outperforms existing approaches in the literature and ii) couples well with both compact and complex depth-from-mono architectures, allowing for its deployment on high-end GPUs as well as on embedded devices (e.g., NVIDIA Jetson TX2).

show abstract

Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge

Cited by 202 publications

References 59 publications

Self-Supervised Monocular Depth Hints

Self-Supervised Monocular Depth Hints

Deep Learning for Real-Time 3D Multi-Object Detection, Localisation, and Tracking: Application to Smart Mobility

Enhancing Self-Supervised Monocular Depth Estimation with Traditional Visual Odometry

Contact Info

Product

Resources

About