2021
DOI: 10.3390/s21175694
|View full text |Cite
|
Sign up to set email alerts
|

CAPformer: Pedestrian Crossing Action Prediction Using Transformer

Abstract: Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 29 publications
0
15
0
Order By: Relevance
“…Our proposed model suffers from a hard reduction of training samples, compared to [15] which limits its performance on larger datasets, e.g., JAAD all . Nevertheless, our model is on par with CAPformer [17] and outperforms TrouSPI-Net [19] for both subsets.…”
Section: Methodsmentioning
confidence: 86%
See 2 more Smart Citations
“…Our proposed model suffers from a hard reduction of training samples, compared to [15] which limits its performance on larger datasets, e.g., JAAD all . Nevertheless, our model is on par with CAPformer [17] and outperforms TrouSPI-Net [19] for both subsets.…”
Section: Methodsmentioning
confidence: 86%
“…Furthermore, we compare our model, at different anticipation times, against PCPA [15] which uses fixed and overlapped observations of 0.5 s in the range [2 − 1] s earlier the event. For our purposes, this model is modified to output earlier predictions in the range [4 − 1] s. We also consider CAPformer [17] and TrouSPI-Net [19] models.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For pedestrians, body and facial keypoints detectors [35] act as the core for prediction systems. Deep learning approaches use body keypoints to anticipate changes in pedestrian motion patterns [36] and also to predict the intention of crossing from the sidewalk or at a crosswalk [37], [38]. Face key points are of paramount importance for the detection of crossing intention, being eye contact a powerful non-verbal channel of communication often used to express intention to drivers.…”
Section: Overview Of Predictive Perception Systemsmentioning
confidence: 99%
“…In this field, 2D poses [2], [3], pedestrian bounding boxes [4], optical flow [5], scene context [6], vehicles speeds [7], trajectories [8], ego-motion of vehicles [7] are utilized in previous works. In the meantime, the deep learning models, such as I3D [5], LSTM/RNN-based temporal models [8], [9], as well as the transformers [10] are adopted in recent years. However, because of the high-mobility of pedestrian, the prediction results of previous works do not approve each other [11], especially for the starting time when the pedestrians show a small scale.…”
Section: Introductionmentioning
confidence: 99%