In this paper, we present a novel 2D–3D pedestrian tracker designed for applications in autonomous vehicles. The system operates on a tracking by detection principle and can track multiple pedestrians in complex urban traffic situations. By using a behavioral motion model and a non-parametric distribution as state model, we are able to accurately track unpredictable pedestrian motion in the presence of heavy occlusion. Tracking is performed independently, on the image and ground plane, in global, motion compensated coordinates. We employ Camera and LiDAR data fusion to solve the association problem where the optimal solution is found by matching 2D and 3D detections to tracks using a joint log-likelihood observation model. Each 2D–3D particle filter then updates their state from associated observations and a behavioral motion model. Each particle moves independently following the pedestrian motion parameters which we learned offline from an annotated training dataset. Temporal stability of the state variables is achieved by modeling each track as a Markov Decision Process with probabilistic state transition properties. A novel track management system then handles high level actions such as track creation, deletion and interaction. Using a probabilistic track score the track manager can cull false and ambiguous detections while updating tracks with detections from actual pedestrians. Our system is implemented on a GPU and exploits the massively parallelizable nature of particle filters. Due to the Markovian nature of our track representation, the system achieves real-time performance operating with a minimal memory footprint. Exhaustive and independent evaluation of our tracker was performed by the KITTI benchmark server, where it was tested against a wide variety of unknown pedestrian tracking situations. On this realistic benchmark, we outperform all published pedestrian trackers in a multitude of tracking metrics.
Abstract. Depth images generated by direct projection of LiDAR point clouds on the image plane suffer from a great level of sparsity which is difficult to interpret by classical computer vision algorithms. We propose a method for completing sparse depth images in a semantically accurate manner by training a novel morphological neural network. Our method approximates morphological operations by Contraharmonic Mean Filter layers which are easily trained in a contemporary deep learning framework. An early fusion U-Net architecture then combines dilated depth channels and RGB using multi-scale processing. Using a large scale RGB-D dataset we are able to learn the optimal morphological and convolutional filter shapes that produce an accurate and fully sampled depth image at the output. Independent experimental evaluation confirms that our method outperforms classical image restoration techniques as well as current state-of-the-art neural networks. The resulting depth images preserve object boundaries and can easily be used to augment various tasks in intelligent vehicles perception systems.
Abstract:In this paper we propose a novel real-time method for SLAM in autonomous vehicles. The environment is mapped using a probabilistic occupancy map model and EGO motion is estimated within the same environment by using a feedback loop. Thus, we simplify the pose estimation from 6 to 3 degrees of freedom which greatly impacts the robustness and accuracy of the system. Input data is provided via a rotating laser scanner as 3D measurements of the current environment which are projected on the ground plane. The local ground plane is estimated in real-time from the actual point cloud data using a robust plane fitting scheme based on the RANSAC principle. Then the computed occupancy map is registered against the previous map using phase correlation in order to estimate the translation and rotation of the vehicle. Experimental results demonstrate that the method produces high quality occupancy maps and the measured translation and rotation errors of the trajectories are lower compared to other 6DOF methods. The entire SLAM system runs on a mid-range GPU and keeps up with the data from the sensor which enables more computational power for the other tasks of the autonomous vehicle.
A growing interest in technologies for autonomous driving emphasizes the demand for safe and reliable perception systems in various driving conditions. The current state-of-theart perception solutions rely on data-driven machine learning approaches, and require large amounts of annotated data to train accurate models. In this study we have identified limitations in the existing radar-based traffic datasets, and propose a richer, annotated raw radar dataset. The proposed solution is a semi-automatic data labeling tool, which generates an initial set of candidate annotations using state-of-the-art automatic object recognition algorithms, and requires only minimal manual intervention. In the first qualitative evaluation ever for automotive radar datasets we measure the quality of automatically computed labels under various light conditions, occlusion, behavior and modeling bias based on a multitude of tracking metrics. We determined the specific cases where automatic labeling is sufficient and where a human annotator needs to inspect and manually correct errors made by the algorithms.
Abstract-We present a novel technique for fast and accurate reconstruction of depth images from 3D point clouds acquired in urban and rural driving environments. Our approach focuses entirely on the sparse distance and reflectance measurements generated by a LiDAR sensor. The main contribution of this paper is a combined segmentation and upsampling technique that preserves the important semantical structure of the scene. Data from the point cloud is segmented and projected onto a virtual camera image where a series of image processing steps are applied in order to reconstruct a fully sampled depth image. We achieve this by means of a multilateral filter that is guided into regions of distinct objects in the segmented point cloud. Thus, the gains of the proposed approach are two-fold: measurement noise in the original data is suppressed and missing depth values are reconstructed to arbitrary resolution. Objective evaluation in an automotive application shows state-of-the-art accuracy of our reconstructed depth images. Finally, we show the qualitative value of our images by training and evaluating a RGBD pedestrian detection system. By reinforcing the RGB pixels with our reconstructed depth values in the learning stage, a significant increase in detection rates can be realized while the model complexity remains comparable to the baseline.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.