In this work, we present two new methods to overcome the lack of annotated long-wavelength infrared (LWIR) data by exploiting the abundance of similar RGB imagery. We introduce a novel unsupervised adaptation to the cycleGAN architecture for translating non-corresponding LWIR/RGB datasets. Our ultimate goal is high detection rates in the synthetic RGB/real LWIR imagery using only RGB labelled imagery for training detection algorithms. We translate LWIR imagery to RGB, allowing us to use an RGB trained detection algorithm. We, thereby remove the need for labelled LWIR imagery for training detection algorithms. In addition, we translate RGB to LWIR to fine-tune a network for detection in real LWIR imagery. Experimental results show that our adaption helps to create synthetic RGB imagery with higher detection rates across two different datasets. We also find that combining the synthetic RGB and real LWIR imagery produces higher F1 scores on the RGB trained detection network. Fine-tuning detection networks with synthetic LWIR and testing with real LWIR imagery produce the highest F1 scores.
Deep neural networks achieve state-of-the-art performance on object detection tasks with RGB data. However, there are many advantages of detection using multi-modal imagery for defence and security operations. For example, the IR modality offers persistent surveillance and is essential in poor lighting conditions and 24hr operation. It is, therefore, crucial to create an object detection system which can use IR imagery. Collecting and labelling large volumes of thermal imagery is incredibly expensive and time-consuming. Consequently, we propose to mobilise labelled RGB data to achieve detection in the IR modality. In this paper, we present a method for multi-modal object detection using unsupervised transfer learning and adaptation techniques. We train faster RCNN on RGB imagery and test with a thermal imager. The images contain object classes; people and land vehicles and represent real-life scenes which include clutter and occlusions. We improve the baseline F1-score by up to 20% through training with an additional loss function, which reduces the difference between RGB and IR feature maps. This work shows that unsupervised modality adaptation is possible, and we have the opportunity to maximise the use of labelled RGB imagery for detection in multiple modalities. The novelty of this work includes; the use of the IR imagery, modality adaption from RGB to IR for object detection and the ability to use real-life imagery in uncontrolled environments. The practical impact of this work to the defence and security community is an increase in performance and the saving of time and money in data collection and annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.