ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Bansal, Mayank; Krizhevsky, Alex; Ogale, Abhijit S.

doi:10.48550/arxiv.1812.03079

Cited by 141 publications

(289 citation statements)

References 17 publications

Supporting

Mentioning

274

Contrasting

Order By: Relevance

“…In [45], [39], [44], a Convolutional Neural Network (CNN) is used to extract scene's features. The learning power of CNN is utilized in [42], [23], [11], [7] to implicitly learn both interactions and the scene semantics. To this end, they render the scene semantics and states of agents in the scene in a multidimensional image and use CNNs to capture the underlying relations between dimensions.…”

Section: Previous Workmentioning

confidence: 99%

“…By taking the input domain structure of neural networks in mind, it is clear that the scene's contextual information should be represented in a way that is digestible for the networks. In most of the previous works on vehicle trajectory prediction [39], [45], [23], [11], [7], scene's contextual information is rendered into image-like raster inputs, and 2D Convolutional Neural Networks (CNN) are employed to learn an abstract representation. This is inspired by the success of CNNs in various computer vision tasks [26], [21], [19], [41], making rendered images and CNNs as standard input representations and processors, respectively.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SVG-Net: An SVG-based Trajectory Prediction Model

Bahari,

Zehtab,

Khorasani

et al. 2021

Preprint

View full text Add to dashboard Cite

Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems. To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories. Most of the proposed approaches represent the scene with a rasterized format and some of the more recent approaches leverage custom vectorized formats. In contrast, we propose representing the scene's information by employing Scalable Vector Graphics (SVG). SVG is a well-established format that matches the problem of trajectory prediction better than rasterized formats while being more general than arbitrary vectorized formats. SVG has the potential to provide the convenience and generality of raster-based solutions if coupled with a powerful tool such as CNNs, for which we introduce SVG-Net. SVG-Net is a Transformer-based Neural Network that can effectively capture the scene's information from SVG inputs. Thanks to the selfattention mechanism in its Transformers, SVG-Net can also adequately apprehend relations amongst the scene and the agents. We demonstrate SVG-Net's effectiveness by evaluating its performance on the publicly-available Argoverse forecasting dataset. Finally, we illustrate how, by using SVG, one can benefit from datasets and advancements in other research fronts that also utilize the same input format. Our code is available at https://vita-epfl.github.io/SVGNet/.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

SVG-Net: An SVG-based Trajectory Prediction Model

Bahari,

Zehtab,

Khorasani

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In relation to this distinction, a categorization by different input modalities to raster-, polyline-, or sensor-based approaches can be made. Raster-based approaches [10], [5], [11], [12], [13], [14] take in Bird's Eye View (BEV) images of an agent's environment with elements such as road lanes and bounding boxes of other perceived agents. By doing so, the CNN network can extract local spatial information from a single representational domain.…”

Section: Related Workmentioning

confidence: 99%

Self-Supervised Action-Space Prediction for Automated Driving

Janjoš¹,

Dolgov²,

Zöllner³

2021

Preprint

View full text Add to dashboard Cite

Making informed driving decisions requires reliable prediction of other vehicles' trajectories. In this paper, we present a novel learned multi-modal trajectory prediction architecture for automated driving. It achieves kinematically feasible predictions by casting the learning problem into the space of accelerations and steering angles -by performing action-space prediction, we can leverage valuable model knowledge. Additionally, the dimensionality of the action manifold is lower than that of the state manifold, whose intrinsically correlated states are more difficult to capture in a learned manner. For the purpose of action-space prediction, we present the simple Feed-Forward Action-Space Prediction (FFW-ASP) architecture. Then, we build on this notion and introduce the novel Self-Supervised Action-Space Prediction (SSP-ASP) architecture that outputs future environment context features in addition to trajectories. A key element in the self-supervised architecture is that, based on an observed action history and past context features, future context features are predicted prior to future trajectories. The proposed methods are evaluated on real-world datasets containing urban intersections and roundabouts, and show accurate predictions, outperforming state-of-the-art for kinematically feasible predictions in several prediction metrics.

show abstract

“…3D perception is an important part of the automatic driving system [1,11]. At present, many 3D detectors have been proposed.…”

Section: Introductionmentioning

confidence: 99%

“…The main contributions of this work can be summarized as (1) We indicate the essential difference between point-based and voxel-based RoI feature extractors is the tradeoff between location and structure information. Then we point out that compared to accurate position information, structural information is more important for 3D object detection; (2) We propose a plug-and-play module called self-attention RoI Feature Extractor (SARFE) to automatically adjust local features in a proposal to get features with stronger structure information; (3) Our proposed method SARFE achieves newly the-state-of-art performance on cyclist of KITTI dataset [8] while keeping real-time ability.…”

Section: Introductionmentioning

confidence: 99%

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Zhang¹,

Zheng²,

et al. 2021

Preprint

View full text Add to dashboard Cite

Unlike 2D object detection where all RoI features come from grid pixels, the RoI feature extraction of 3D point cloud object detection is more diverse. In this paper, we first compare and analyze the differences in structure and performance between the two state-of-the-art models PV-RCNN and Voxel-RCNN. Then, we find that the performance gap between the two models does not come from point information, but structural information. The voxel features contain more structural information because they do quantization instead of downsampling to point cloud so that they can contain basically the complete information of the whole point cloud. The stronger structural information in voxel features makes the detector have higher performance in our experiments even if the voxel features don't have accurate location information. Then, we propose that structural information is the key to 3D object detection. Based on the above conclusion, we propose a Self-Attention RoI Feature Extractor (SARFE) to enhance structural information of the feature extracted from 3D proposals. SARFE is a plug-and-play module that can be easily used on existing 3D detectors. Our SARFE is evaluated on both KITTI dataset and Waymo Open dataset. With the newly introduced SARFE, we improve the performance of the state-ofthe-art 3D detectors by a large margin in cyclist on KITTI dataset while keeping real-time capability. The code will be released at https://github.com/Poley97/SARFE.

show abstract

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Cited by 141 publications

References 17 publications

SVG-Net: An SVG-based Trajectory Prediction Model

SVG-Net: An SVG-based Trajectory Prediction Model

Self-Supervised Action-Space Prediction for Automated Driving

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Contact Info

Product

Resources

About