“…In this field, 2D poses [2], [3], pedestrian bounding boxes [4], optical flow [5], scene context [6], vehicles speeds [7], trajectories [8], ego-motion of vehicles [7] are utilized in previous works. In the meantime, the deep learning models, such as I3D [5], LSTM/RNN-based temporal models [8], [9], as well as the transformers [10] are adopted in recent years. However, because of the high-mobility of pedestrian, the prediction results of previous works do not approve each other [11], especially for the starting time when the pedestrians show a small scale.…”