AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting

Yuan, Ye; Weng, Xinshuo; Ou, Yanglan

doi:10.48550/arxiv.2103.14023

Cited by 8 publications

(21 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In inference, we use greedy decoding to decode the future trajectory, and based on the experiments, using greedy decoding in training is able to increase the trajectory prediction accuracy during inference. This increase is also observed in [22]. Therefore, in both training and inference, the decoder decodes the future trajectory of the camera wearer in a greedy autoregressive manner.…”

Section: Model Structurementioning

confidence: 82%

“…STAR [21] predicts pedestrian trajectories with only the attention mechanism, which is achieved by a graph-based spatial transformer and a temporal transformer. AgentFormer [22], a transformer-based framework, jointly models temporal and social dimensions in human motion dynamics to predict future trajectories. Our model is also based on transformer, but we differ from them in that 1) we target for egocentric scenarios, and 2) our model encodes multiple modalities with a novel cascaded crossattention mechanism, whereas their models are designed to use the past trajectories as the only cue for the future trajectory prediction.…”

Section: A Non-egocentric Human Trajectory Predictionmentioning

confidence: 99%

“…However, such data is currently lacking and limited research has been carried out in forecasting the trajectory of the camera wearer [9], [10], [11], [12]. Most human trajectory forecasting methods are targeted for scenarios captured from a bird-eye view or by a static camera [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. Different from these views, egocentric view brings up the opportunity of understanding how humans perceive the surroundings, and initiate/execute the responsive actions.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion

Qiu,

Chen,

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we address the problem of forecasting the trajectory of an egocentric camera wearer (egoperson) in crowded spaces. The trajectory forecasting ability learned from the data of different camera wearers walking around in the real world can be transferred to assist visually impaired people in navigation, as well as to instill human navigation behaviours in mobile robots, enabling better humanrobot interactions. To this end, a novel egocentric human trajectory forecasting dataset was constructed, containing real trajectories of people navigating in crowded spaces wearing a camera, as well as extracted rich contextual data. We extract and utilize three different modalities to forecast the trajectory of the camera wearer, i.e., his/her past trajectory, the past trajectories of nearby people, and the environment such as the scene semantics or the depth of the scene. A Transformerbased encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism that fuses multiple modalities, has been designed to predict the future trajectory of the camera wearer. Extensive experiments have been conducted, and the results have shown that our model outperforms the state-of-the-art methods in egocentric human trajectory forecasting.

show abstract

Section: Model Structurementioning

confidence: 82%

Section: A Non-egocentric Human Trajectory Predictionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion

Qiu,

Chen,

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, people have found transformers to have stronger encoding abilities and the advantage of avoiding recursion [13]. The current state-of-the-art algorithm of predicting human trajectories on ETH dataset, AgentFormer, uses transformer from end to end [14]. Therefore, we also experiment using transformer structure as encoder in addition to lstm to extract information from observed trajectories.…”

Section: Related Workmentioning

confidence: 99%

Vehicle Trajectory Prediction Using Generative Adversarial Network With Temporal Logic Syntax Tree Features

Rosman²,

Gilitschenski

et al. 2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Predicting traffic agents' trajectories is an important task for auto-piloting. Most previous work on trajectory prediction only considers a single class of road agents. We use a sequence-to-sequence model to predict future paths from observed paths and we incorporate class information into the model by concatenating extracted label representations with traditional location inputs. We experiment with both LSTM and transformer encoders and we use generative adversarial network as introduced in Social GAN [3] to learn the multi-modal behavior of traffic agents. We train our model on Stanford Drone dataset which includes 6 classes of road agents and evaluate the impact of different model components on the prediction performance in multiclass scenes.

show abstract

“…In general, researchers use transformers to process the node embedding in two orthogonal directions: first, through the node-wise residual feature transformation, an arbitrary type of intra-node transformation is enabled [18,47,48]; second, through the attention mechanism, features from different nodes are dynamically aggregated and the inter-nodes relationships are captured [48]. Previous efforts have shown the potential of transformers in multi-agent system [49], by flattening connections features across time and agents.…”

Section: Introductionmentioning

confidence: 99%

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems

Zheng

Guo

Yang

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-agent control is a central theme in the Cyber-Physical Systems (CPS). However, current control methods either receive non-Markovian states due to insufficient sensing and decentralized design, or suffer from poor convergence. This paper presents the Delayed Propagation Transformer (DePT), a new transformerbased model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world. DePT induces a cone-shaped spatial-temporal attention prior, which injects the information propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS -network-scale traffic signal control system in the open world -show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets. Our codes are released at: https://github.com/VITA-Group/DePT.

show abstract

AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting

Cited by 8 publications

References 46 publications

Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion

Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion

Vehicle Trajectory Prediction Using Generative Adversarial Network With Temporal Logic Syntax Tree Features

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems

Contact Info

Product

Resources

About