“…More recently, a variety of neural networks have been explored for learning socially-aware motion representations [5,43]. Some peculiar neural architecture designs, such as feature pooling [3,21,27], attention mechanism [12,34,75,86], and spatial-temporal graph [36,41,50,81], have yielded promising results in crowded environments. However, the robustness of these methods remains a central concern.…”