2020
DOI: 10.48550/arxiv.2012.00514
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Modal Hybrid Architecture for Pedestrian Action Prediction

Abstract: Pedestrian behavior prediction is one of the major challenges for intelligent driving systems in urban environments. Pedestrians often exhibit a wide range of behaviors and adequate interpretations of those depend on various sources of information such as pedestrian appearance, states of other road users, the environment layout, etc. To address this problem, we propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future cro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 38 publications
0
8
0
Order By: Relevance
“…More complex features are input at the bottom of the model and simpler features are input at the top. In [10], a multi-modal prediction network is proposed, which uses four feature elements: global semantic map, local scene, pedestrian motion and vehicle speed. These features are gradually integrated into the network at different processing levels.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…More complex features are input at the bottom of the model and simpler features are input at the top. In [10], a multi-modal prediction network is proposed, which uses four feature elements: global semantic map, local scene, pedestrian motion and vehicle speed. These features are gradually integrated into the network at different processing levels.…”
Section: Related Workmentioning
confidence: 99%
“…Ideally, the higher precision and recall, the better, but the actual situation is that the two affect each other: the pursuit of high accuracy rate will lead to low recall rate; the pursuit of high recall rate will usually reduce the accuracy rate. In order to balance the accuracy and recall rates, the F1 parameter is introduced, and its calculation formula is shown in Equation (10).…”
Section: Benchmark and Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, this method does not take global context into account. [8] proposed a multi-modal based prediction system that integrates four feature sources (local scene, semantic map, pedestrian motion, and ego-motion). The global context (semantic map) is utilized, but it lacks other important features such as human pose.…”
Section: Related Workmentioning
confidence: 99%
“…Later on, with the maturity of recurrent neural networks (RNNs), pedestrian crossing intention was predicted by considering both the spatial and temporal information [2], [3], [4]. This led to different ways of fusing different features, e.g., the detected pedestrian bounding boxes, poses, appearance, and even the ego-vehicle information [5], [6], [7], [8], [9]. The most recent benchmark of pedestrian intention prediction was released by [10], in which the PCPA model achieved the state-of-the-art in the most popular dataset JAAD [1].…”
Section: Introductionmentioning
confidence: 99%