Zhijie Fang scite author profile

Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and nighttime. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images; (b) just infrared images; and (c) both of them. In order to obtain results for the last item, we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset that we have built for this purpose as well as on the publicly available KAIST multispectral dataset.

show abstract

On-Board Detection of Pedestrian Intentions

Fang

Vázquez

López

2017

Sensors

View full text Add to dashboard Cite

Avoiding vehicle-to-pedestrian crashes is a critical requirement for nowadays advanced driver assistant systems (ADAS) and future self-driving vehicles. Accordingly, detecting pedestrians from raw sensor data has a history of more than 15 years of research, with vision playing a central role. During the last years, deep learning has boosted the accuracy of image-based pedestrian detectors. However, detection is just the first step towards answering the core question, namely is the vehicle going to crash with a pedestrian provided preventive actions are not taken? Therefore, knowing as soon as possible if a detected pedestrian has the intention of crossing the road ahead of the vehicle is essential for performing safe and comfortable maneuvers that prevent a crash. However, compared to pedestrian detection, there is relatively little literature on detecting pedestrian intentions. This paper aims to contribute along this line by presenting a new vision-based approach which analyzes the pose of a pedestrian along several frames to determine if he or she is going to enter the road or not. We present experiments showing 750 ms of anticipation for pedestrians crossing the road, which at a typical urban driving speed of 50 km/h can provide 15 additional meters (compared to a pure pedestrian detector) for vehicle automatic reactions or to warn the driver. Moreover, in contrast with state-of-the-art methods, our approach is monocular, neither requiring stereo nor optical flow information.

show abstract

Is the Pedestrian going to Cross? Answering by 2D Pose Estimation

Fang

López

2018

104

View full text Add to dashboard Cite

Our recent work suggests that, thanks to nowadays powerful CNNs, image-based 2D pose estimation is a promising cue for determining pedestrian intentions such as crossing the road in the path of the ego-vehicle, stopping before entering the road, and starting to walk or bending towards the road. This statement is based on the results obtained on non-naturalistic sequences (Daimler dataset), i.e. in sequences choreographed specifically for performing the study. Fortunately, a new publicly available dataset (JAAD) has appeared recently to allow developing methods for detecting pedestrian intentions in naturalistic driving conditions; more specifically, for addressing the relevant question is the pedestrian going to cross? Accordingly, in this paper we use JAAD to assess the usefulness of 2D pose estimation for answering such a question. We combine CNN-based pedestrian detection, tracking and pose estimation to predict the crossing action from monocular images. Overall, the proposed pipeline provides new state-ofthe-art results.

show abstract

Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation

Fang

López

2020

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Anticipating the intentions of vulnerable road users (VRUs) such as pedestrians and cyclists is critical for performing safe and comfortable driving maneuvers. This is the case for human driving and, thus, should be taken into account by systems providing any level of driving assistance, from advanced driver assistant systems (ADAS) to fully autonomous vehicles (AVs). In this paper, we show how the latest advances on monocular vision-based human pose estimation, i.e. those relying on deep Convolutional Neural Networks (CNNs), enable to recognize the intentions of such VRUs. In the case of cyclists, we assume that they follow traffic rules to indicate future maneuvers with arm signals. In the case of pedestrians, no indications can be assumed. Instead, we hypothesize that the walking pattern of a pedestrian allows to determine if he/she has the intention of crossing the road in the path of the ego-vehicle, so that the ego-vehicle must maneuver accordingly (e.g. slowing down or stopping). In this paper, we show how the same methodology can be used for recognizing pedestrians and cyclists' intentions. For pedestrians, we perform experiments on the JAAD dataset. For cyclists, we did not found an analogous dataset, thus, we created our own one by acquiring and annotating videos which we share with the research community. Overall, the proposed pipeline provides new state-of-the-art results on the intention recognition of VRUs.

show abstract

Bilinear neural network with 3-D attention for brain decoding of motor imagery movements from the human EEG

et al. 2020

View full text Add to dashboard Cite

Deep learning has achieved great success in areas such as computer vision and natural language processing. In the past, some work used convolutional networks to process EEG signals and reached or exceeded traditional machine learning methods. We propose a novel network structure and call it QNet. It contains a newly designed attention module: 3D-AM, which is used to learn the attention weights of EEG channels, time points, and feature maps. It provides a way to automatically learn the electrode and time selection. QNet uses a dual branch structure to fuse bilinear vectors for classification. It performs four, three, and two classes on the EEG Motor Movement/Imagery Dataset. The average crossvalidation accuracy of 65.82%, 74.75%, and 82.88% was obtained, which are 7.24%, 4.93%, and 2.45% outperforms than the state-of-the-art, respectively. The article also visualizes the attention weights learned by QNet and shows its possible application for electrode channel selection.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhijie Fang

Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison

On-Board Detection of Pedestrian Intentions

Is the Pedestrian going to Cross? Answering by 2D Pose Estimation

Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation

Bilinear neural network with 3-D attention for brain decoding of motor imagery movements from the human EEG

Contact Info

Product

Resources

About