2018 IEEE Winter Conference on Applications of Computer Vision (WACV) 2018
DOI: 10.1109/wacv.2018.00145
|View full text |Cite
|
Sign up to set email alerts
|

Driving Scene Perception Network: Real-Time Joint Detection, Depth Estimation and Semantic Segmentation

Abstract: As the demand for enabling high-level autonomous driving has increased in recent years and visual perception is one of the critical features to enable fully autonomous driving, in this paper, we introduce an efficient approach for simultaneous object detection, depth estimation and pixel-level semantic segmentation using a shared convolutional architecture. The proposed network model, which we named Driving Scene Perception Network (DSPNet), uses multi-level feature maps and multi-task learning to improve the … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
27
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 49 publications
(28 citation statements)
references
References 27 publications
1
27
0
Order By: Relevance
“…Each task has only 2 specific parametric layers, while everything else is shared in computer vision have been following this strategy; in particular, Eigen & Fergus [9] trained a single architecture (but with different copies) to predict depth, surface normals and semantic segmentation, Kokkinos [10] proposed a universal network to tackle 7 different vision tasks, Dvornik et al [11] found it beneficial to do joint semantic segmentation and object detection, while Kendall et al [12] learned optimal weights to perform instance segmentation, semantic segmentation and depth estimation all at once. Chen et al [13] built a single network with the ResNet-50 [14] backbone performing joint semantic segmentation, depth estimation and object detection. To alleviate the problem of imbalanced annotations, Kokkinos [10] chose to accumulate the gradients for each task until a certain number of examples per task is seen, while Dvornik et al [11] simply resorted to keeping the branch with no ground truth available intact until at least one example of that modality is seen.…”
Section: Related Workmentioning
confidence: 99%
“…Each task has only 2 specific parametric layers, while everything else is shared in computer vision have been following this strategy; in particular, Eigen & Fergus [9] trained a single architecture (but with different copies) to predict depth, surface normals and semantic segmentation, Kokkinos [10] proposed a universal network to tackle 7 different vision tasks, Dvornik et al [11] found it beneficial to do joint semantic segmentation and object detection, while Kendall et al [12] learned optimal weights to perform instance segmentation, semantic segmentation and depth estimation all at once. Chen et al [13] built a single network with the ResNet-50 [14] backbone performing joint semantic segmentation, depth estimation and object detection. To alleviate the problem of imbalanced annotations, Kokkinos [10] chose to accumulate the gradients for each task until a certain number of examples per task is seen, while Dvornik et al [11] simply resorted to keeping the branch with no ground truth available intact until at least one example of that modality is seen.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-task learning (Kokkinos, 2017), (Chen et al, 2018b), (Neven et al, 2017) has been gaining significant popularity over the past few years as it has proven to be very efficient for embedded deployment. Multiple tasks like object detection, semantic segmentation, depth estimation etc can be solved simultaneously using a single model.…”
Section: Multi-task Learningmentioning
confidence: 99%
“…MultiNet [26] combines classification, detection and segmentation in a single architecture, which is based on ResNet [28] and consists of three shared encoding layers followed by task independent decoding layers. Reference [29] introduces an efficient approach for simultaneous object detection, depth estimation and pixel-level semantic segmentation using a shared convolutional architecture. Reference [27] proposes a unified neural network to detect drivable areas, lane lines, and traffic objects simultaneously, in which the three tasks are most important for autonomous driving.…”
Section: Introdctionmentioning
confidence: 99%