This paper proposes a deep-learning model with task-specific bounding box regressors (TSBBRs) and conditional back-propagation mechanisms for detection of objects in motion for advanced driver assistance system (ADAS) applications. The proposed model separates the object detection networks for objects of different sizes and applies the proposed algorithm to achieve better detection results for both larger and tinier objects. For larger objects, a neural network with a larger visual receptive field is used to acquire information from larger areas. For the detection of tinier objects, the network of a smaller receptive field utilizes fine grain features. A conditional back-propagation mechanism yields different types of TSBBRs to perform data-driven learning for the set criterion and learn the representation of different object sizes without degrading each other. The design of dual-path object bounding box regressors can simultaneously detect objects in various kinds of dissimilar scales and aspect ratios. Only a single inference of neural network is needed for each frame to support the detection of multiple types of object, such as bicycles, motorbikes, cars, buses, trucks, and pedestrians, and to locate their exact positions. The proposed model was developed and implemented on different NVIDIA devices such as 1080 Ti, DRIVE-PX2 and Jetson TX-2 with the respective processing performance of 67 frames per second (fps), 19.4 fps, and 8.9 fps for the video input of 448 × 448 resolution, respectively. The proposed model can detect objects as small as 13 × 13 pixels and achieves 86.54% accuracy on a publicly available Pascal Visual Object Class (VOC) car database and 82.4% mean average precision (mAP) on a large collection of common road real scenes database (iVS database).
This paper proposes a lightweight moving object prediction system to detect and recognize pedestrian crossings, vehicles cutting-in, and vehicles ahead applying emergency brakes based on a 3D Convolution network for behavior prediction. The proposed design significantly improves the performance of the conventional 3D convolution network (C3D) adapted to predict the behaviors employing behavior recognition network capable of performing object localization, which is pivotal in detecting the numerous moving objects’ behaviors, combining and verifying the detected objects with the results of the YOLO v3 detection model with that of the proposed C3D model. Since the proposed system is a lightweight CNN model requiring far lesser parameters, it can be efficiently realized on an embedded system for real-time applications. The proposed lightweight C3D model achieves 10 frames per second (FPS) on a NVIDIA Jetson AGX Xavier and yields over 92.8% accuracy in recognizing pedestrian crossing, over 94.3% accuracy in detecting vehicle cutting-in behavior, and over 95% accuracy for vehicles applying emergency brakes.
This paper proposes an improvement to the multi-object tracking system framework based on the image inputs. By analyzing the role and performance of each block in the original multi-objects tracking system, the blocks of the original system are reconstructed to enhance the efficiency and yield a faster processing speed suiting the real-time applications. In the proposed method, the first two parts of the multiobject tracking system are merged into a single neural network designed for object detection and feature extraction. A new object association judgment method and JDE inspired prediction head are included in order to achieve a better and an outstanding association effect resulting in the overall improvement of the original system by 45.2%. The enhanced method is aimed at the application of smart roadside units and uses fixedviewpoint image input to achieve multi-object tracking on embedded platforms. The proposed method is implemented on the NVIDIA Jetson AGX Xavier embedded platform. The NVIDIA TensorRT software development kit is used to accelerate the neural network. The overall performance of the proposed system yields better efficiency compared to that of the original SDE design and the overall computing performance achieve up to 14-26 images per second, making it ideal for the real-time smart roadside unit applications.
This paper proposes a deep learning-based mmWave radar and RGB camera sensor early fusion method for object detection and tracking and its embedded system realization for ADAS applications. The proposed system can be used not only in ADAS systems but also to be applied to smart Road Side Units (RSU) in transportation systems to monitor real-time traffic flow and warn road users of probable dangerous situations. As the signals of mmWave radar are less affected by bad weather and lighting such as cloudy, sunny, snowy, night-light, and rainy days, it can work efficiently in both normal and adverse conditions. Compared to using an RGB camera alone for object detection and tracking, the early fusion of the mmWave radar and RGB camera technology can make up for the poor performance of the RGB camera when it fails due to bad weather and/or lighting conditions. The proposed method combines the features of radar and RGB cameras and directly outputs the results from an end-to-end trained deep neural network. Additionally, the complexity of the overall system is also reduced such that the proposed method can be implemented on PCs as well as on embedded systems like NVIDIA Jetson Xavier at 17.39 fps.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.