“…With the recent success of deep networks, RNN-based approaches have become prevalent. These works propose to model interactions among multiple agents by applying aggregation functions on their RNN hidden states [1,15,19], running convolutional layers on agents' spatial feature maps [5,10,64,58], or leveraging attention mechanisms or relational reasoning on constructed graphs of agents [27,50,51,63,57]. Some recent studies are, however, rethinking the use of RNN and social information in modeling temporal dependencies and borrowing the idea of transformers into the area [13].…”