This paper reviews the second challenge on spectral reconstruction from RGB images, i.e., the recovery of wholescene hyperspectral (HS) information from a 3-channel RGB image. As in the previous challenge, two tracks were provided: (i) a "Clean" track where HS images are estimated from noise-free RGBs, the RGB images are themselves calculated numerically using the ground-truth HS images and supplied spectral sensitivity functions (ii) a "Real World" track, simulating capture by an uncalibrated and unknown camera, where the HS images are recovered from noisy JPEG-compressed RGB images. A new, larger-than-ever, natural hyperspectral image data set is presented, containing a total of 510 HS images. The Clean and Real World tracks had 103 and 78 registered participants respectively, with 14 teams competing in the final testing phase. A description of the proposed methods, alongside their challenge scores and an extensive evaluation of top performing methods is also provided. They gauge the state-of-the-art in spectral reconstruction from an RGB image.
Recently Convolutional Neural Networks (CNN) have been used to reconstruct hyperspectral information from RGB images. Moreover, this spectral reconstruction problem (SR) can often be solved with good (low) error. However, these methods are not physically plausible: that is when the recovered spectra are reintegrated with the underlying camera sensitivities, the resulting predicted RGB is not the same as the actual RGB, and sometimes this discrepancy can be large. The problem is further compounded by exposure change. Indeed, most learningbased SR models train for a fixed exposure setting and we show that this can result in poor performance when exposure varies.In this paper we show how CNN learning can be extended so that physical plausibility is enforced and the problem resulting from changing exposures is mitigated. Our SR solution improves the state-of-the-art spectral recovery performance under varying exposure conditions while simultaneously ensuring physical plausibility (i.e. the recovered spectra reintegrate to the input RGBs exactly).
Spectral reconstruction (SR) algorithms attempt to recover hyperspectral information from RGB camera responses. Recently, the most common metric for evaluating the performance of SR algorithms is the Mean Relative Absolute Error (MRAE)—an ℓ1 relative error (also known as percentage error). Unsurprisingly, the leading algorithms based on Deep Neural Networks (DNN) are trained and tested using the MRAE metric. In contrast, the much simpler regression-based methods (which actually can work tolerably well) are trained to optimize a generic Root Mean Square Error (RMSE) and then tested in MRAE. Another issue with the regression methods is—because in SR the linear systems are large and ill-posed—that they are necessarily solved using regularization. However, hitherto the regularization has been applied at a spectrum level, whereas in MRAE the errors are measured per wavelength (i.e., per spectral channel) and then averaged. The two aims of this paper are, first, to reformulate the simple regressions so that they minimize a relative error metric in training—we formulate both ℓ2 and ℓ1 relative error variants where the latter is MRAE—and, second, we adopt a per-channel regularization strategy. Together, our modifications to how the regressions are formulated and solved leads to up to a 14% increment in mean performance and up to 17% in worst-case performance (measured with MRAE). Importantly, our best result narrows the gap between the regression approaches and the leading DNN model to around 8% in mean accuracy.
Spectral reconstruction algorithms recover spectra from RGB sensor responses. Recent methods—with the very best algorithms using deep learning—can already solve this problem with good spectral accuracy. However, the recovered spectra are physically incorrect in that they do not induce the RGBs from which they are recovered. Moreover, if the exposure of the RGB image changes then the recovery performance often degrades significantly—i.e., most contemporary methods only work for a fixed exposure. In this paper, we develop a physically accurate recovery method: the spectra we recover provably induce the same RGBs. Key to our approach is the idea that the set of spectra that integrate to the same RGB can be expressed as the sum of a unique fundamental metamer (spanned by the camera’s spectral sensitivities and linearly related to the RGB) and a linear combination of a vector space of metameric blacks (orthogonal to the spectral sensitivities). Physically plausible spectral recovery resorts to finding a spectrum that adheres to the fundamental metamer plus metameric black decomposition. To further ensure spectral recovery that is robust to changes in exposure, we incorporate exposure changes in the training stage of the developed method. In experiments we evaluate how well the methods recover spectra and predict the actual RGBs and RGBs under different viewing conditions (changing illuminations and/or cameras). The results show that our method generally improves the state-of-the-art spectral recovery (with more stabilized performance when exposure varies) and provides zero colorimetric error. Moreover, our method significantly improves the color fidelity under different viewing conditions, with up to a 60% reduction in some cases.
Recently Convolutional Neural Networks (CNN) have been used to reconstruct hyperspectral information from RGB images, and this spectral reconstruction problem (SR) can often be solved with good (low) error. However, little attention has been paid on whether these models' behavior can adhere to physics. We show that the leading CNN method introduces unexpected 'colorimetric errors', which means the recovered spectra do not reproduce ground-truth RGBs, and sometimes this discrepancy can be large. The problem is further compounded by exposure change. Indeed, most CNN models over-fit to fixed exposure and we demonstrate that this can result in poor performance when exposure varies. In this paper we show how CNN learning can be extended so that the physical plausibility of SR is enforced. Remarkably, our physically plausible CNN solutions advance both spectral and colorimetric performance of the original network, while the application of data augmentation trades off the network performance for model stability against varying exposure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.