2018
DOI: 10.48550/arxiv.1812.03079
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Abstract: Our goal is to train a policy for autonomous driving via imitation learning that is robust enough to drive a real vehicle. We find that standard behavior cloning is insufficient for handling complex driving scenarios, even when we leverage a perception system for preprocessing the input and a controller for executing the output on the car: 30 million examples are still not enough. We propose exposing the learner to synthesized data in the form of perturbations to the expert's driving, which creates interesting… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
274
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 141 publications
(289 citation statements)
references
References 17 publications
2
274
0
Order By: Relevance
“…In [45], [39], [44], a Convolutional Neural Network (CNN) is used to extract scene's features. The learning power of CNN is utilized in [42], [23], [11], [7] to implicitly learn both interactions and the scene semantics. To this end, they render the scene semantics and states of agents in the scene in a multidimensional image and use CNNs to capture the underlying relations between dimensions.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In [45], [39], [44], a Convolutional Neural Network (CNN) is used to extract scene's features. The learning power of CNN is utilized in [42], [23], [11], [7] to implicitly learn both interactions and the scene semantics. To this end, they render the scene semantics and states of agents in the scene in a multidimensional image and use CNNs to capture the underlying relations between dimensions.…”
Section: Previous Workmentioning
confidence: 99%
“…By taking the input domain structure of neural networks in mind, it is clear that the scene's contextual information should be represented in a way that is digestible for the networks. In most of the previous works on vehicle trajectory prediction [39], [45], [23], [11], [7], scene's contextual information is rendered into image-like raster inputs, and 2D Convolutional Neural Networks (CNN) are employed to learn an abstract representation. This is inspired by the success of CNNs in various computer vision tasks [26], [21], [19], [41], making rendered images and CNNs as standard input representations and processors, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…In relation to this distinction, a categorization by different input modalities to raster-, polyline-, or sensor-based approaches can be made. Raster-based approaches [10], [5], [11], [12], [13], [14] take in Bird's Eye View (BEV) images of an agent's environment with elements such as road lanes and bounding boxes of other perceived agents. By doing so, the CNN network can extract local spatial information from a single representational domain.…”
Section: Related Workmentioning
confidence: 99%
“…3D perception is an important part of the automatic driving system [1,11]. At present, many 3D detectors have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…The main contributions of this work can be summarized as (1) We indicate the essential difference between point-based and voxel-based RoI feature extractors is the tradeoff between location and structure information. Then we point out that compared to accurate position information, structural information is more important for 3D object detection; (2) We propose a plug-and-play module called self-attention RoI Feature Extractor (SARFE) to automatically adjust local features in a proposal to get features with stronger structure information; (3) Our proposed method SARFE achieves newly the-state-of-art performance on cyclist of KITTI dataset [8] while keeping real-time ability.…”
Section: Introductionmentioning
confidence: 99%