Multi-Modal Hybrid Architecture for Pedestrian Action Prediction

Rasouli, Amir; Yau, Tiffany; Rohani, Mohsen; Luo, Jun

doi:10.48550/arxiv.2012.00514

Cited by 5 publications

(8 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More complex features are input at the bottom of the model and simpler features are input at the top. In [10], a multi-modal prediction network is proposed, which uses four feature elements: global semantic map, local scene, pedestrian motion and vehicle speed. These features are gradually integrated into the network at different processing levels.…”

Section: Related Workmentioning

confidence: 99%

“…Ideally, the higher precision and recall, the better, but the actual situation is that the two affect each other: the pursuit of high accuracy rate will lead to low recall rate; the pursuit of high recall rate will usually reduce the accuracy rate. In order to balance the accuracy and recall rates, the F1 parameter is introduced, and its calculation formula is shown in Equation (10).…”

Section: Benchmark and Metricsmentioning

confidence: 99%

“…Later studies considered both spatial and temporal features, using recurrent neural networks (RNN) and three-dimensional convolution neural network (3DCNN) to extract spatio-temporal information [5][6][7][8][9]. At the same time, different methods are used to fuse a variety of features to predict pedestrians' crossing behaviors, such as pedestrian bounding box, posture, vehicle speed and surrounding environment information [10][11][12][13][14].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Prediction of Pedestrian Crossing Behavior Based on Surveillance Video

Zhou

Ren

Zhang

et al. 2022

Sensors

View full text Add to dashboard Cite

Prediction of pedestrian crossing behavior is an important issue faced by the realization of autonomous driving. The current research on pedestrian crossing behavior prediction is mainly based on vehicle camera. However, the sight line of vehicle camera may be blocked by other vehicles or the road environment, making it difficult to obtain key information in the scene. Pedestrian crossing behavior prediction based on surveillance video can be used in key road sections or accident-prone areas to provide supplementary information for vehicle decision-making, thereby reducing the risk of accidents. To this end, we propose a pedestrian crossing behavior prediction network for surveillance video. The network integrates pedestrian posture, local context and global context features through a new cross-stacked gated recurrence unit (GRU) structure to achieve accurate prediction of pedestrian crossing behavior. Applied onto the surveillance video dataset from the University of California, Berkeley to predict the pedestrian crossing behavior, our model achieves the best results regarding accuracy, F1 parameter, etc. In addition, we conducted experiments to study the effects of time to prediction and pedestrian speed on the prediction accuracy. This paper proves the feasibility of pedestrian crossing behavior prediction based on surveillance video. It provides a reference for the application of edge computing in the safety guarantee of automatic driving.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Benchmark and Metricsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Prediction of Pedestrian Crossing Behavior Based on Surveillance Video

Zhou

Ren

Zhang

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Nevertheless, this method does not take global context into account. [8] proposed a multi-modal based prediction system that integrates four feature sources (local scene, semantic map, pedestrian motion, and ego-motion). The global context (semantic map) is utilized, but it lacks other important features such as human pose.…”

Section: Related Workmentioning

confidence: 99%

“…Later on, with the maturity of recurrent neural networks (RNNs), pedestrian crossing intention was predicted by considering both the spatial and temporal information [2], [3], [4]. This led to different ways of fusing different features, e.g., the detected pedestrian bounding boxes, poses, appearance, and even the ego-vehicle information [5], [6], [7], [8], [9]. The most recent benchmark of pedestrian intention prediction was released by [10], in which the PCPA model achieved the state-of-the-art in the most popular dataset JAAD [1].…”

Section: Introductionmentioning

confidence: 99%

Predicting Pedestrian Crossing Intention with Feature Fusion and Spatio-Temporal Attention

Yang

Zhang²,

Yurtsever³

et al. 2021

Preprint

View full text Add to dashboard Cite

Predicting vulnerable road user behavior is an essential prerequisite for deploying Automated Driving Systems (ADS) in the real-world. Pedestrian crossing intention should be recognized in real-time, especially for urban driving. Recent works have shown the potential of using vision-based deep neural network models for this task. However, these models are not robust and certain issues still need to be resolved. First, the global spatio-temproal context that accounts for the interaction between the target pedestrian and the scene has not been properly utilized. Second, the optimum strategy for fusing different sensor data has not been thoroughly investigated. This work addresses the above limitations by introducing a novel neural network architecture to fuse inherently different spatiotemporal features for pedestrian crossing intention prediction. We fuse different phenomena such as sequences of RGB imagery, semantic segmentation masks, and ego-vehicle speed in an optimum way using attention mechanisms and a stack of recurrent neural networks. The optimum architecture was obtained through exhaustive ablation and comparison studies. Extensive comparative experiments on the JAAD pedestrian action prediction benchmark demonstrate the effectiveness of the proposed method, where state-of-the-art performance was achieved. Our code is open-source and publicly available: https://github.com/OSU-Haolin/Pedestrian_ Crossing_Intention_Prediction.

show abstract

Learning Trajectory-Conditioned Relations to Predict Pedestrian Crossing Behavior

Zhou

AlRegib

Parchami

et al. 2022

2022 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

In smart transportation, intelligent systems avoid potential collisions by predicting the intent of traffic agents, especially pedestrians. Pedestrian intent, defined as future action, e.g., start crossing, can be dependent on traffic surroundings. In this paper, we develop a framework to incorporate such dependency given observed pedestrian trajectory and scene frames. Our framework first encodes regional joint information between a pedestrian and surroundings over time into feature-map vectors. The global relation representations are then extracted from pairwise feature-map vectors to estimate intent with past trajectory condition. We evaluate our approach on two public datasets and compare against two state-of-the-art approaches. The experimental results demonstrate that our method helps to inform potential risks during crossing events with 0.04 improvement in F1-score on JAAD dataset and 0.01 improvement in recall on PIE dataset. Furthermore, we conduct ablation experiments to confirm the contribution of the relation extraction in our framework.

show abstract

Multi-Modal Hybrid Architecture for Pedestrian Action Prediction

Cited by 5 publications

References 38 publications

Prediction of Pedestrian Crossing Behavior Based on Surveillance Video

Prediction of Pedestrian Crossing Behavior Based on Surveillance Video

Predicting Pedestrian Crossing Intention with Feature Fusion and Spatio-Temporal Attention

Learning Trajectory-Conditioned Relations to Predict Pedestrian Crossing Behavior

Contact Info

Product

Resources

About