Thermal face imagery, which captures the naturally emitted heat from the face, is limited in availability compared to face imagery in the visible spectrum. To help address this scarcity of thermal face imagery for research and algorithm development, we present the DEVCOM Army Research Laboratory Visible-Thermal Face Dataset (ARL-VTF). With over 500,000 images from 395 subjects, the ARL-VTF dataset represents, to the best of our knowledge, the largest collection of paired visible and thermal face images to date. The data was captured using a modern long wave infrared (LWIR) camera mounted alongside a stereo setup of three visible spectrum cameras. Variability in expressions, pose, and eyewear has been systematically recorded. The dataset has been curated with extensive annotations, metadata, and standardized protocols for evaluation. Furthermore, this paper presents extensive benchmark results and analysis on thermal face landmark detection and thermal-to-visible face verification by evaluating state-of-the-art models on the ARL-VTF dataset.
Interest in thermal to visible face recognition has grown significantly over the last decade due to advancements in thermal infrared cameras and analytics beyond the visible spectrum. Despite large discrepancies between thermal and visible spectra, existing approaches bridge domain gaps by either synthesizing visible faces from thermal faces or by learning the cross-spectrum image representations. These approaches typically work well with frontal facial imagery collected at varying ranges and expressions, but exhibit significantly reduced performance when matching thermal faces with varying poses to frontal visible faces. We propose a novel Domain and Pose Invariant Framework that simultaneously learns domain and pose invariant representations. Our proposed framework is composed of modified networks for extracting the most correlated intermediate representations from off-pose thermal and frontal visible face imagery, a sub-network to jointly bridge domain and pose gaps, and a joint-loss function comprised of crossspectrum and pose-correction losses. We demonstrate efficacy and advantages of the proposed method by evaluating on three thermalvisible datasets: ARL Visible-to-Thermal Face, ARL Multimodal Face, and Tufts Face. Although DPIF focuses on learning to match off-pose thermal to frontal visible faces, we also show that DPIF enhances performance when matching frontal thermal face images to frontal visible face images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.