MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification

Lukas, Lukas

doi:10.1109/itsc.2019.8917177

Cited by 19 publications

(14 citation statements)

References 50 publications

(102 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considering that most of the research on image depth estimation now was pixel-level depth estimation, we compared the instance-level depth estimation we invented with them. According to the latest research on pixel-level depth estimation (Fu et al, 2018;Ren et al, 2019;Liebel & Körner, 2019), for example, the relative absolute error of the depth estimation of DORN on the KITTI dataset was 8.78% (Fu et al, 2018), and the relative absolute error of the depth estimation of MultiDepth on the same dataset was 13.82% (Liebel & Körner, 2019). Compared with pixel- a FPS means frames per second and the FPS here refers to the FPS running on the computer.…”

Section: Results Of 3d Object Localization and Detectionmentioning

confidence: 99%

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

Zhou,

Peng,

Long

et al. 2020

Preprint

View full text Add to dashboard Cite

Monocular multi-object detection and localization in 3D space has been proven to be a challenging task. The MoNet3D algorithm is a novel and effective framework that can predict the 3D position of each object in a monocular image and draw a 3D bounding box for each object. The MoNet3D method incorporates prior knowledge of the spatial geometric correlation of neighbouring objects into the deep neural network training process to improve the accuracy of 3D object localization. Experiments on the KITTI dataset show that the accuracy for predicting the depth and horizontal coordinates of objects in 3D space can reach 96.25% and 94.74%, respectively. Moreover, the method can realize the real-time image processing at 27.85 FPS, showing promising potential for embedded advanced drivingassistance system applications. Our code is publicly available at https://github. com/CQUlearningsystemgroup/ YicongPeng.

show abstract

Section: Results Of 3d Object Localization and Detectionmentioning

confidence: 99%

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

Zhou,

Peng,

Long

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Similarly, Kendall et al [15] improved depth estimation results for road scenes by evaluating semantic and instance labels at the same time. A different approach on single-image depth estimation using multi-task learning has been pursued by Liebel et al [6] who posed this natural regression task as the classification of discrete depth ranges as an additional auxiliary task and jointly solved for both targets. Other problems that have recently been tackled by multi-task learning include facial landmark detection [34] and person attribute classification [22].…”

Section: Related Workmentioning

confidence: 99%

“…As a secondary source of information, buildings are classified based on their roof geometries with classes for flat and non-flat roofs. In the closely related field of depth estimation it can be observed that the optimization of regression tasks is generally harder than the optimization of corresponding classification tasks [12,6]. By restricting the highly non-convex optimization space through this constraint, this auxiliary objective doubles as a regularization measure [16].…”

Section: Segmentation Lossmentioning

confidence: 99%

A generalized multi-task learning approach to stereo DSM filtering in urban areas

Lukas

Bittner

2020

ISPRS Journal of Photogrammetry and Remote Sensing

Self Cite

View full text Add to dashboard Cite

City models and height maps of urban areas serve as a valuable data source for numerous applications, such as disaster management or city planning. While this information is not globally available, it can be substituted by digital surface models (DSMs), automatically produced from inexpensive satellite imagery. However, stereo DSMs often suffer from noise and blur. Furthermore, they are heavily distorted by vegetation, which is of lesser relevance for most applications. Such basic models can be filtered by convolutional neural networks (CNNs), trained on labels derived from digital elevation models (DEMs) and 3D city models, in order to obtain a refined DSM. We propose a modular multi-task learning concept that consolidates existing approaches into a generalized framework. Our encoder-decoder models with shared encoders and multiple task-specific decoders leverage roof type classification as a secondary task and multiple objectives including a conditional adversarial term. The contributing single-objective losses are automatically weighted in the final multi-task loss function based on learned uncertainty estimates. We evaluated the performance of specific instances of this family of network architectures. Our method consistently outperforms the state of the art on common data, both quantitatively and qualitatively, and generalizes well to a new dataset of an independent study area.

show abstract

“…In addition, they exploited a special module to remove the shadows existing in real-world images when applying their model to real data (34). Liebel et al proposed MultiDepth, a sort of training strategy, to solve the problems of notorious instability and slow convergence in depth training, by developing a auxiliary task of depth interval classification (35).…”

Section: Related Workmentioning

confidence: 99%

Novel Hybrid Neural Network for Dense Depth Estimation using On-Board Monocular Images

Jia

Pei

Yang

et al. 2020

Transportation Research Record

View full text Add to dashboard Cite

Depth information from still 2D images plays an important role in automated driving, driving safety, and robotics. Monocular depth estimation is considered as an ill-posed and inherently ambiguous problem in general, and a tight issue is how to obtain global information efficiently since pure convolutional neural networks (CNNs) merely extract the local information. To end that, some previous works utilized conditional random fields (CRFs) to obtain the global information, but it is notoriously difficult to optimize. In this paper, a novel hybrid neural network is proposed to solve that, and concurrently a dense depth map is predicted from the monocular still image. Specifically: first, the deep residual network is utilized to obtain multi-scale local information and then feature correlation (FCL) blocks are used to correlate these features. Finally, the feature selection attention-based mechanism is adopted to fuse the multi-layer features, and the multi-layer recurrent neural networks (RNNs) are utilized with bidirectional long short-term memory (Bi-LSTM) unit as the output layer. Furthermore, a novel logarithm exponential average error (LEAE) is proposed to overcome over-weighted problem. The multi-scale feature correlation network (MFCN) is evaluated on large-scale KITTI benchmarks (LKT), which is a subset of KITTI raw dataset, and NYU depth v2. The experiments indicate that the proposed unified network outperforms existing methods. This method also updates the state-of-the-art performance on LKT datasets. Importantly, the depth estimation method can be widely used for collision risk assessment and avoidance in driving assistance systems or automated pilot systems to achieve safety in a more economical and convenient way.

show abstract

MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification

Cited by 19 publications

References 50 publications

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

A generalized multi-task learning approach to stereo DSM filtering in urban areas

Novel Hybrid Neural Network for Dense Depth Estimation using On-Board Monocular Images

Contact Info

Product

Resources

About