Light field cameras record not only the spatial information of observed scenes but also the directions of all incoming light rays. The spatial and angular information implicitly contain geometrical characteristics such as multi-view or epipolar geometry, which can be exploited to improve the performance of depth estimation. An Epipolar Plane Image (EPI), the unique 2D spatial-angular slice of the light field, contains patterns of oriented lines. The slope of these lines is associated with the disparity. Benefiting from this property of EPIs, some representative methods estimate depth maps by analyzing the disparity of each line in EPIs. However, these methods often extract the optimal slope of the lines from EPIs while ignoring the relationship between neighboring pixels, which leads to inaccurate depth map predictions. Based on the observation that an oriented line and its neighboring pixels in an EPI share a similar linear structure, we propose an end-to-end fully convolutional network (FCN) to estimate the depth value of the intersection point on the horizontal and vertical EPIs. Specifically, we present a new feature-extraction module, called Oriented Relation Module (ORM), that constructs the relationship between the line orientations. To facilitate training, we also propose a refocusing-based data augmentation method to obtain different slopes from EPIs of the same scene point. Extensive experiments verify the efficacy of learning relations and show that our approach is competitive to other state-of-the-art methods. The code and the trained models are available at https://github.com/lkyahpu/EPI_ORM.git.
Single-frame infrared small target detection is still a challenging task due to the complex background and unobvious structural characteristics of small targets. Recently, convolutional neural networks (CNN) began to appear in the field of infrared small target detection and have been widely used for excellent performance. However, existing CNN-based methods mainly focus on local spatial features while ignoring the long-range contextual dependencies between small targets and backgrounds. To capture the global context-aware information, we propose a fusion network architecture of Transformer and CNN (FTC-Net), which consists of two branches. The CNN-based branch uses a U-Net with skip connections to obtain low-level local details of small targets. The Transformer-based branch applies hierarchical selfattention mechanisms to learn long-range contextual dependencies. Specifically, The Transformer branch can suppress background interferences and enhance target features. To obtain local and global feature representation, we design a feature fusion module (FFM) to realize the feature concentration of two branches. We implement ablation and comparative experiments on a publicly accessed SIRST dataset. Experimental results show that the Transformer-based branch is effective and suggest the superiority of the proposed FTC-Net compared with other stateof-the-art methods.
The fusion of infrared intensity and polarization images can generate a single image with better visible perception and more vital information. Existing fusion methods based on a convolutional neural network (CNN), with local feature extraction, have the limitation of fully exploiting salient target features of polarization. In this Letter, we propose a transformer-based deep network to improve the performance of infrared polarization image fusion. Compared with existing CNN-based methods, our model can encode long-range features of infrared polarization images to obtain global contextual information using the self-attention mechanism. We also design a loss function with the self-supervised constraint to boost the performance of fusion. Experiments on the public infrared polarization dataset validate the effectiveness of the proposed method. Our approach achieves better fusion performance than the state-of-the-art.
Infrared polarization image fusion integrates intensity and polarization information, producing a fused image that enhances visibility and captures crucial details. However, in complex environments, polarization imaging is susceptible to noise interference. Existing fusion methods typically use the infrared intensity (S0) and degree of linear polarization (DoLP) images for fusion but fail to consider the noise interference, leading to reduced performance. To cope with this problem, we propose a fusion method based on polarization salient prior, which extends DoLP by angle of polarization (AoP) and introduces polarization distance (PD) to obtain salient target features. Moreover, according to the distribution difference between S0 and DoLP features, we construct a fusion network based on attention-guided filtering, utilizing cross-attention to generate filter kernels for fusion. The quantitative and qualitative experimental results validate the effectiveness of our approach. Compared with other fusion methods, our method can effectively suppress noise interference and preserve salient target features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.