A pedestrian detection system is a crucial component of advanced driver assistance systems since it contributes to road flow safety. The safety of traffic participants could be significantly improved if these systems could also predict and recognize pedestrian's actions, or even estimate the time, for each pedestrian, to cross the street. In this paper, we focus not only on pedestrian detection and pedestrian action recognition but also on estimating if the pedestrian's action presents a risky situation according to time to cross the street. We propose 1) a pedestrian detection and action recognition component based, on RetinaNet; 2) an estimation of the time to cross the street for multiple pedestrians using a recurrent neural network. For each pedestrian, the recurrent network estimates the pedestrian's action intention in order to predict the time to cross the street. We based our experiments on the JAAD dataset, and show that integrating multiple pedestrian action tags for the detection part when merge with a recurrent neural network (LSTM) allows a significant performance improvement.INDEX TERMS Action recognition, deep learning, pedestrian detection, time-to-cross estimation.
Abstract-A wide variety of approaches have been proposed for pedestrian detection in the last decay and it still remains an open challenge due to its outstanding importance in the field of automotive. In recent years, deep learning classification methods, in particular convolutional neural networks, combined with multi-modality images applied on different fusion schemes have achieved great performances in computer vision tasks. For the pedestrian recognition task, the late-fusion scheme outperforms the early and intermediate integration of modalities. In this paper, we focus on improving and optimizing the late-fusion scheme for pedestrian classification on the Daimler stereo vision data set. We propose different training methods based on Cross-Modality deep learning of Convolutional Neural Networks (CNNs): (1) a correlated model, (2) an incremental model and, (3) a particular cross-modality model, where each CNN is trained on one modality, but tested on a different one. The experiments show that the incremental cross-modality deep learning of CNNs achieves the best performances. It improves the classification performances not only for each modality classifier, but also for the multi-modality late-fusion scheme. The particular cross-modality model is a promising idea for automated annotation of modality images with a classifier trained on a different modality and/or for cross-dataset training.
In spite of the large number of existing methods, pedestrian detection remains an open challenge. In recent years, deep learning classification methods combined with multimodality images within different fusion schemes have achieved the best performance. It was proven that the late-fusion scheme outperforms both direct and intermediate integration of modalities for pedestrian recognition. Hence, in this paper, we focus on improving the late-fusion scheme for pedestrian classification on the Daimler stereo vision data set. Each image modality, Intensity, Depth and Flow, is classified by an independent Convolutional Neural Network (CNN), the outputs of which are then fused by a Multi-layer Perceptron (MLP) before the recognition decision. We propose different methods based on Cross-Modality deep learning of CNNs: (1) a correlated model where a unique CNN is trained with Intensity, Depth and Flow images for each frame, (2) an incremental model where a CNN is trained with the first modality images frames, then a second CNN, initialized by transfer learning on the first one is trained on the second modality images frames, and finally a third CNN initialized on the second one, is trained on the last modality images frames. The experiments show that the incremental cross-modality deep learning of CNNs improves classification performances not only for each independent modality classifier, but also for the multi-modality classifier based on late-fusion. Different learning algorithms are also investigated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.