Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting

Mercat, Jean; Gilles, Thomas; Zoghby, Nicole El; Sandou, Guillaume; Beauvois, Dominique; Gil, Guillermo Pita

doi:10.1109/icra40945.2020.9197340

Cited by 141 publications

(122 citation statements)

References 15 publications

Supporting

Mentioning

111

Contrasting

Order By: Relevance

“…trajectories. Mercat et al [10] use a long short-term memory (LSTM) [18] to predict multiple future trajectories per traffic actor, jointly for all traffic actors in a scene, and scoring the trajectories using self-attention layers. The approach in [19] also utilizes LSTMs and attention for jointly predicting trajectories, but introduces latent variables for generating multiple trajectories per traffic actor.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Multiple Trajectory Prediction with Deep Temporal and Spatial Convolutional Neural Networks

Strohbeck

Belagiannis

Müller

et al. 2020

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Automated vehicles need to not only perceive their environment, but also predict the possible future behavior of all detected traffic participants in order to safely navigate in complex scenarios and avoid critical situations, ranging from merging on highways to crossing urban intersections. Due to the availability of datasets with large numbers of recorded trajectories of traffic participants, deep learning based approaches can be used to model the behavior of road users. This paper proposes a convolutional network that operates on rasterized actor-centric images which encode the static and dynamic actor-environment. We predict multiple possible future trajectories for each traffic actor, which include position, velocity, acceleration, orientation, yaw rate and position uncertainty estimates. To make better use of the past movement of the actor, we propose to employ temporal convolutional networks (TCNs) and rely on uncertainties estimated from the previous object tracking stage. We evaluate our approach on the public "Argoverse Motion Forecasting" dataset, on which it won the first prize at the Argoverse Motion Forecasting Challenge, as presented on the NeurIPS 2019 workshop on "Machine Learning for Autonomous Driving".

show abstract

Section: Related Workmentioning

confidence: 99%

“…Recently, advances in neural networks [8] as well as the availability of motion forecasting datasets, e.g. [9], gave rise to approaches based on neural networks [10], [11], [12], [13].…”

Section: Introductionmentioning

confidence: 99%

Multiple Trajectory Prediction with Deep Temporal and Spatial Convolutional Neural Networks

Strohbeck

Belagiannis

Müller

et al. 2020

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

show abstract

“…Social Interaction: Similar to previous work [17], we model the interaction between agents with multi-headed attention [22]. Each agent is represented with an embedding vector that encodes its observed trajectory and is used to create a key K, query Q, and value vector V for each of the H attention heads.…”

Section: A Network Structurementioning

confidence: 99%

“…A different line of work has used recurrent neural networks (RNN) to build a representation for the trajectories [17] [13]. On top of the recurrent architecture, contextual information is added to model the interaction between agents and to extract the road structure.…”

Section: Introductionmentioning

confidence: 99%

“…[8] takes into account the multi-modality of predictions when modeling interactions between agents and exploits the power of generative adversarial networks (GAN) to improve the training. [17] uses multi-headed attention [22] as a model of interaction. [20] focuses on the interaction with other agents and the map using attention modules and GANs for training.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction

Rella¹,

Zaech²,

Liniger³

et al. 2021

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Forecasting the future behavior of all traffic agents in the vicinity is a key task to achieve safe and reliable autonomous driving systems. It is a challenging problem as agents adjust their behavior depending on their intentions, the others' actions, and the road layout. In this paper, we propose Decoder Fusion RNN (DF-RNN), a recurrent, attention-based approach for motion forecasting. Our network is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder. We design a map encoder that embeds polyline segments, combines them to create a graph structure, and merges their relevant parts with the agents' embeddings. We fuse the encoded map information with further inter-agent interactions only inside the decoder and propose to use explicit training as a method to effectively utilize the information available. We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.

show abstract