Reliable vision in challenging illumination conditions is one of the crucial requirements of future autonomous automotive systems. In the last decade, thermal cameras have become more easily accessible to a larger number of researchers. This has resulted in numerous studies which confirmed the benefits of the thermal cameras in limited visibility conditions. In this paper, we propose a learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions. The goal is to create natural, intuitive images that would be more informative than a regular RGB camera to a human driver in challenging visibility conditions. The main novelty of this paper is the idea to rely on two types of objective functions for optimization: a similarity metric between the RGB input and the fused output to achieve natural image appearance; and an auxiliary pedestrian detection error to help defining relevant features of the human appearance and blending them into the output. We train a convolutional neural network using image samples from variable conditions (day and night) so that the network learns the appearance of humans in the different modalities and creates more robust results applicable in realistic situations. Our experiments show that the visibility of pedestrians is noticeably improved especially in dark regions and at night. Compared to existing methods we can better learn context and define fusion rules that focus on the pedestrian appearance, while that is not guaranteed with methods that focus on low-level image quality metrics.
A growing interest in technologies for autonomous driving emphasizes the demand for safe and reliable perception systems in various driving conditions. The current state-of-theart perception solutions rely on data-driven machine learning approaches, and require large amounts of annotated data to train accurate models. In this study we have identified limitations in the existing radar-based traffic datasets, and propose a richer, annotated raw radar dataset. The proposed solution is a semi-automatic data labeling tool, which generates an initial set of candidate annotations using state-of-the-art automatic object recognition algorithms, and requires only minimal manual intervention. In the first qualitative evaluation ever for automotive radar datasets we measure the quality of automatically computed labels under various light conditions, occlusion, behavior and modeling bias based on a multitude of tracking metrics. We determined the specific cases where automatic labeling is sufficient and where a human annotator needs to inspect and manually correct errors made by the algorithms.
Interpolation from a Color Filter Array (CFA) is the most common method for obtaining full color image data. Its success relies on the smart combination of a CFA and a demosaicing algorithm. Demosaicing on the one hand has been extensively studied. Algorithmic development in the past 20 years ranges from simple linear interpolation to modern neural-network-based (NN) approaches that encode the prior knowledge of millions of training images to fill in missing data in an inconspicious way. CFA design, on the other hand, is less well studied, although still recognized to strongly impact demosaicing performance. This is because demosaicing algorithms are typically limited to one particular CFA pattern, impeding straightforward CFA comparison. This is starting to change with newer classes of demosaicing that may be considered generic or CFA-agnostic. In this study, by comparing performance of two state-of-the-art generic algorithms, we evaluate the potential of modern CFA-demosaicing. We test the hypothesis that, with the increasing power of NN-based demosaicing, the influence of optimal CFA design on system performance decreases. This hypothesis is supported with the experimental results. Such a finding would herald the possibility of relaxing CFA requirements, providing more freedom in the CFA design choice and producing high-quality cameras.
Millimeter-wave radar is currently the most effective automotive sensor capable of all-weather perception. In order to detect Vulnerable Road Users (VRUs) in cluttered radar data, it is necessary to model the time-frequency signal patterns of human motion, i.e. the micro-Doppler signature. In this paper we propose a spatio-temporal Convolutional Neural Network (CNN) capable of detecting VRUs in cluttered radar data. The main contribution is a weakly supervised training method which uses abundant, automatically generated labels from camera and lidar for training the model. The input to the network is a tensor of temporally concatenated range-azimuth-Doppler arrays, while the ground truth is an occupancy grid formed by objects detected jointly in-camera images and lidar. Lidar provides accurate ranging ground truth, while camera information helps distinguish between VRUs and background. Experimental evaluation shows that the CNN model has superior detection performance compared to classical techniques. Moreover, the model trained with imperfect, weak supervision labels outperforms the one trained with a limited number of perfect, hand-annotated labels. Finally, the proposed method has excellent scalability due to the low cost of automatic annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.