In the context of smart cities, monitoring pedestrian and vehicle movements is essential to recognize abnormal events and prevent accidents. The proposed method in this work focuses on analyzing video streams captured from a vertically installed camera, and performing contextual road user detection. The final detection is based on the fusion of the outputs of three different convolutional neural networks. We are simultaneously interested in detecting road users, their motion, and their location respecting the static environment. We use YOLOv4 for object detection, FC-HarDNet for background semantic segmentation, and FlowNet 2.0 for motion detection. FC-HarDNet and YOLOv4 were retrained with our orthophotographs dataset. The last step involves a data fusion module. The presented results show that the method allows one to detect road users, identify the surfaces on which they move, quantify their apparent velocity, and estimate their actual velocity.
In this study we present a technique which performs pedestrians and vehicles detection through the use of off-the-shelf object detectors alongside existing optical flow based velocity estimation techniques. Road users detection and apparent movement quantification are carried out respectively by YOLOv4 and FlowNet2.0. The speed of the users is then estimated from the average optical flow in each bounding box. Experimental results show that our proposed methods can effectively detect road users and estimate their velocity and movement direction under low light and low contrast urban video scenes. The videos are made with a low cost fish-eye RGB camera placed in a vertical position at a height of less than 10 m. The estimated speed by taking into account the deformations generated by the wide-angle lens, is in very good agreement with reality.
In this paper, we present a deep learning processing flow aimed at Advanced Driving Assistance Systems (ADASs) for urban road users. We use a fine analysis of the optical setup of a fisheye camera and present a detailed procedure to obtain Global Navigation Satellite System (GNSS) coordinates along with the speed of the moving objects. The camera to world transform incorporates the lens distortion function. YOLOv4, re-trained with ortho-photographic fisheye images, provides road user detection. All the information extracted from the image by our system represents a small payload and can easily be broadcast to the road users. The results show that our system is able to properly classify and localize the detected objects in real time, even in low-light-illumination conditions. For an effective observation area of 20 m × 50 m, the error of the localization is in the order of one meter. Although an estimation of the velocities of the detected objects is carried out by offline processing with the FlowNet2 algorithm, the accuracy is quite good, with an error below one meter per second for urban speed range (0 to 15 m/s). Moreover, the almost ortho-photographic configuration of the imaging system ensures that the anonymity of all street users is guaranteed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.