2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793991
|View full text |Cite
|
Sign up to set email alerts
|

Real-time Intent Prediction of Pedestrians for Autonomous Ground Vehicles via Spatio-Temporal DenseNet

Abstract: Understanding the behaviors and intentions of humans are one of the main challenges autonomous ground vehicles still faced with. More specifically, when it comes to complex environments such as urban traffic scenes, inferring the intentions and actions of vulnerable road users such as pedestrians become even harder. In this paper, we address the problem of intent action prediction of pedestrians in urban traffic environments using only image sequences from a monocular RGB camera. We propose a real-time framewo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(50 citation statements)
references
References 29 publications
0
50
0
Order By: Relevance
“…Method Accuracy Alexnet + Context [15] 63.0% Alexnet + SVM [50] 74.4% Alphapose + LSTM [56] 78.0% Res-EnDec [53] 81.0% ST-DenseNet [52] 84.76% auto-encoder + Prediction [54] 86.7% Openpose + Keypoints [55] 88.0% Alexnet + SVM + Context [50] 89.4% CPN + GCN [58] 91 Overall, although SPI-Net is not that complex in its architecture, Table 3 shows that it outperforms by more than 2.5% the current state-of-the-art approach [58] based on CPN [14] for pedestrian discrete intention prediction task on the JAAD data. The confusion matrices in Table 4 also shows that SPI-Net accuracy is similar on both action classes, which demonstrates its ability to adapt to intra-class variation for skeleton-based dynamics.…”
Section: Results On Jaad Data Setmentioning
confidence: 99%
See 1 more Smart Citation
“…Method Accuracy Alexnet + Context [15] 63.0% Alexnet + SVM [50] 74.4% Alphapose + LSTM [56] 78.0% Res-EnDec [53] 81.0% ST-DenseNet [52] 84.76% auto-encoder + Prediction [54] 86.7% Openpose + Keypoints [55] 88.0% Alexnet + SVM + Context [50] 89.4% CPN + GCN [58] 91 Overall, although SPI-Net is not that complex in its architecture, Table 3 shows that it outperforms by more than 2.5% the current state-of-the-art approach [58] based on CPN [14] for pedestrian discrete intention prediction task on the JAAD data. The confusion matrices in Table 4 also shows that SPI-Net accuracy is similar on both action classes, which demonstrates its ability to adapt to intra-class variation for skeleton-based dynamics.…”
Section: Results On Jaad Data Setmentioning
confidence: 99%
“…Afterward, they extend their model in order to take as input a sequence of consecutive cropped images of the pedestrians before they cross in order to consider the temporal coherence in short-term motions (≈0.5 s). Similarly, Saleh et al [52] propose to predict the intended actions of pedestrians based on a spatio-temporal DenseNet model. Pop et al [8] propose to extract spatial information with convolutive layers, then consider temporal dynamics with recurrent layers and propose a new metric for pedestrians dynamics evaluation: the time to cross (TTC) prediction.…”
Section: Pedestrian Intention Predictionmentioning
confidence: 99%
“…In [ 19 ], a 3D CNN is used as a classifier at the end of a pipeline for pedestrian crossing behavior, which includes detection and tracking. The 3D convolutional model is trained with the cropped pedestrians’ bounding boxes detected.…”
Section: Related Workmentioning
confidence: 99%
“…Also, DenseNets are simpler and more efficient as compared to Inception networks [9]. K. Saleh [13] employed Spatiotemporal DenseNet architecture to predict pedestrians' intended action and achieved an average precision score of 0.8476. The DenseNet architecture [12] where each layer is connected to all the others within a dense block.…”
Section: ) Densenetmentioning
confidence: 99%