This work presents a mid-fusion pipeline that can increase the detection performance of a convolutional neural network (RetinaNet) by including polarimetric images even though the network is trained on a large-scale database containing RGB and monochromatic images (Microsoft COCO). Here, the average precision (AP) for each object class quantifies performance. The goal of this work is to evaluate the usefulness of polarimetry for object detection and recognition of road scenes and determine the conditions that will increase AP. Shadows, reflections, albedo, and other object features that reduce RGB image contrast also decrease the AP. This work demonstrates specific cases for which the AP increases using linear Stokes and polarimetric flux images. Images are fused during the neural network evaluation pipeline, which is referred to as mid-fusion. Here, the AP of polarimetric mid-fusion is greater than the RGB AP in 54 out of 80 detection instances. The recall values for cars and buses are similar for RGB and polarimetry, but values increase from 36% to 38% when using polarimetry for detecting people. Videos of linear Stokes images for four different scenes are collected at three different times of the day for two driving directions. Despite this limited dataset and the use of a pretrained network, this work demonstrates selective enhancement of object detection through mid-fusion of polarimetry to neural networks trained on RGB images.
The performance of a convolutional neural network (CNN) on an image texture detection task as a function of linear image processing and the number of training images is investigated. Performance is quantified by the area under (AUC) the receiver operating characteristic
(ROC) curve. The Ideal Observer (IO) maximizes AUC but depends on high-dimensional image likelihoods. In many cases, the CNN performance can approximate the IO performance. This work demonstrates counterexamples where a full-rank linear transform degrades the CNN performance below the IO in
the limit of large quantities of training data and network layers. A subsequent linear transform changes the images’ correlation structure, improves the AUC, and again demonstrates the CNN dependence on linear processing. Compression strictly decreases or maintains the IO detection performance
while compression can increase the CNN performance especially for small quantities of training data. Results indicate an optimal compression ratio for the CNN based on task difficulty, compression method, and number of training images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.