2018 24th International Conference on Pattern Recognition (ICPR) 2018
DOI: 10.1109/icpr.2018.8546189
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions

Abstract: Convolutional Neural Networks (CNN) have been successfully applied to autonomous driving tasks, many in an endto-end manner. Previous end-to-end steering control methods take an image or an image sequence as the input and directly predict the steering angle with CNN. Although single task learning on steering angles has reported good performances, the steering angle alone is not sufficient for vehicle control. In this work, we propose a multi-task learning framework to predict the steering angle and speed contr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
67
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 126 publications
(67 citation statements)
references
References 25 publications
0
67
0
Order By: Relevance
“…The other fast developing area of forefront road identification is based on image processing from monocular and binocular (stereo) cameras for using the extracted data in Advanced Driving Assistance Systems (ADAS) (Krasner, Katz 2016;Prashanth et al 2014) or self-driving vehicle systems (Yang et al 2018;Mahmud et al 2012;Milz et al 2018). Monocular vision is usually used for determining weather and illumination (Gimonet et al 2015;Cheng et al 2018), path and obstacle (Nadav, Katz 2016), road, line, road edge detection and recognition (Yang et al 2018;Van Hamme et al 2013;Zhang, Wu 2009). Binocular vision can be efficiently used for object ranging, and usually shows better performance than monocular vision, especially in the task for creation of depth maps and point clouds from visual data, achieving results comparable or better than Light Detection And Ranging (LiDAR) (Smolyanskiy et al 2018).…”
Section: System Adaptation In Advancementioning
confidence: 99%
See 1 more Smart Citation
“…The other fast developing area of forefront road identification is based on image processing from monocular and binocular (stereo) cameras for using the extracted data in Advanced Driving Assistance Systems (ADAS) (Krasner, Katz 2016;Prashanth et al 2014) or self-driving vehicle systems (Yang et al 2018;Mahmud et al 2012;Milz et al 2018). Monocular vision is usually used for determining weather and illumination (Gimonet et al 2015;Cheng et al 2018), path and obstacle (Nadav, Katz 2016), road, line, road edge detection and recognition (Yang et al 2018;Van Hamme et al 2013;Zhang, Wu 2009). Binocular vision can be efficiently used for object ranging, and usually shows better performance than monocular vision, especially in the task for creation of depth maps and point clouds from visual data, achieving results comparable or better than Light Detection And Ranging (LiDAR) (Smolyanskiy et al 2018).…”
Section: System Adaptation In Advancementioning
confidence: 99%
“…Huge speed improvements are shown using Graphics Processing Unit (GPU) processing in embedded systems such as Jetson TX2 (Smolyanskiy et al 2018). Availability of benchmark datasets such as Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) by Menze and Geiger (2015), Udacity (2016) promotes the development of even better processing methods for various use cases, but also there are research on simulated/ synthetic data (Gimonet et al 2015) and newly collected datasets such as Shanghai Automotive Industry Corporation (SAIC) (Yang et al 2018).…”
Section: System Adaptation In Advancementioning
confidence: 99%
“…Additional research is being conducted on multi-modal learning [25,15,8]. Multi-modal learning involves relating information from multiple types of input.…”
Section: Introductionmentioning
confidence: 99%
“…The concurrent work of [6] presents a multi-task multimodal approach for autonomous driving on model cars which makes use of a high-level directional command to direct the model car to turn left, right, or go straight. However, the directional command is simply used to select between separate MTL networks trained on specific com- [25] also presents a multi-task multi-modal approach to autonomous driving on the road using the secondary input of past inferred driving speeds. Thus, while this is multi-modal learning, it is not directly comparable to our approach in which higher level information is inserted into the network.…”
Section: Introductionmentioning
confidence: 99%
“…Our second contribution is an extensive quantitative study, with our realistic simulator, of the influence on the online performance with error accumulation, of label augmentation, failure case generation and training data resampling. To do so, we leverage a dataset containing more than 200 hours of driving, to be compared with 72 hours [4] and 5 hours [11] [13]. This bigger dataset makes it possible for our network to have a better generalization capacity, i.e.…”
Section: Introductionmentioning
confidence: 99%