Spatial-Temporal Attentive Motion Planning Network for Autonomous Vehicles

Ayalew, Melese; Zhou, Shijie; Assefa, Maregu; Yilma, Getinet

doi:10.1109/iccwamtip53232.2021.9674096

Cited by 9 publications

(10 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although these methods dealt with learning attentive features, which are mostly applied to classification and segmentation tasks, they do not take into account the simultaneous acquisition of spatiotemporal attention for sequential decision-making problems. Recently, STAMPNet [24] applied the squeeze-and-excitation (SE) module [49] in their feature extractor and 3D-ResNet [53] to learn attended intermediate features of the video and trajectory history for trajectory planning. Following this work, we introduced the SE module into our 3DCNN feature extractor, but instead of using a single backbone, we use a Siamese backbone to simultaneously learn intermediate spatiotemporal features that are invariant across (front and top) driving views.…”

Section: Attention-based Methodsmentioning

confidence: 99%

“…This section describes the general overview of the proposed ViSTAMPCNet for autonomous vehicles, shown in Figure 1. In general, the ViSTAMPCNet is based on imitation learning using CNN-LSTM models [16,22,24], which involves a mapping from expert observations to view-invariant spatiotemporal representations. Then, the view-invariant spatiotemporal representations are used for driving decision making, i.e., driving control command and future trajectory generation.…”

Section: Overviewmentioning

confidence: 99%

“…where T i+1 is the predicted future trajectory. To optimize the trajectory planning model, we use the MSE [24] objective function over T c i and the predicted trajectory T i+1 as follows:…”

Section: Trajectory Planning and Control Modulementioning

confidence: 99%

“…For evaluation, we adopt the L 2 loss (Equation ( 8)) and top-1 accuracy metrics to validate the trajectory generation performance and high-level control command classification accuracy following VTG-Net [22] and FCN-LSTM [16], respectively. We use trajectory generation results and control accuracy obtained from prior works, such as FCN-LSTM [16], CNNState-FCN [20], VTGNet [22], and STAMPNet [24] as baselines.…”

Section: Dataset and Evaluation Metricsmentioning

confidence: 99%

“…Consequently, learned decision-making in such approaches is typically limited to the driving environment or tasks in which it was trained [16]. Moreover, driving decisions made in such an approach lack interpretability due to the black-box nature of end-to-end learning models [24]. To improve the robustness of these methods, many researchers have proposed more advanced deep learning models that can leverage multi-modal and multi-view information [25][26][27][28].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

View-Invariant Spatiotemporal Attentive Motion Planning and Control Network for Autonomous Vehicles

et al. 2022

Self Cite

View full text Add to dashboard Cite

Autonomous driving vehicles (ADVs) are sleeping giant intelligent machines that perceive their environment and make driving decisions. Most existing ADSs are built as hand-engineered perception-planning-control pipelines. However, designing generalized handcrafted rules for autonomous driving in an urban environment is complex. An alternative approach is imitation learning (IL) from human driving demonstrations. However, most previous studies on IL for autonomous driving face several critical challenges: (1) poor generalization ability toward the unseen environment due to distribution shift problems such as changes in driving views and weather conditions; (2) lack of interpretability; and (3) mostly trained to learn the single driving task. To address these challenges, we propose a view-invariant spatiotemporal attentive planning and control network for autonomous vehicles. The proposed method first extracts spatiotemporal representations from images of a front and top driving view sequence through attentive Siamese 3DResNet. Then, the maximum mean discrepancy loss (MMD) is employed to minimize spatiotemporal discrepancies between these driving views and produce an invariant spatiotemporal representation, which reduces domain shift due to view change. Finally, the multitasking learning (MTL) method is employed to jointly train trajectory planning and high-level control tasks based on learned representations and previous motions. Results of extensive experimental evaluations on a large autonomous driving dataset with various weather/lighting conditions verified that the proposed method is effective for feasible motion planning and control in autonomous vehicles.

show abstract

Section: Attention-based Methodsmentioning

confidence: 99%

Section: Overviewmentioning

confidence: 99%

“…where T i+1 is the predicted future trajectory. To optimize the trajectory planning model, we use the MSE [24] objective function over T c i and the predicted trajectory T i+1 as follows:…”

Section: Trajectory Planning and Control Modulementioning

confidence: 99%