DCAT: Dual Cross-Attention-Based Transformer for Change Detection

Zhou, Yuan; Huo, Chunlei; Zhu, Jiahang; Huo, Leigang; Chen, Pan

doi:10.3390/rs15092395

Cited by 10 publications

(4 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The structure of this network is complicated, and its computational efficiency is low. Zhou et al [31] introduced a dual cross-attention transformer (DCAT) network. This network is designed to extract both low-frequency and high-frequency information from input images through the computation of two distinct types of cross-attention features.…”

Section: Related Workmentioning

confidence: 99%

LRDE-Net: Large Receptive Field and Image Difference Enhancement Network for Remote Sensing Images Change Detection

Li,

Wang,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

In the field of remote sensing, change detection is a crucial study area. Deep learning has made significant strides in the study of remote sensing image change detection during the past few years. Deep learning techniques still have some drawbacks. The global context cannot be modeled by convolutional neural networks due to the receptive field's restrictions. When extracting visual characteristics, the neural network does not concentrate more on the change region, which results in poor distinction between change and no-change regions.To address these problems, we propose networks with large receptive fields and difference image enhancement. First, we design the large receptive field (LRF) strategy. It employs a long kernel shape in one spatial dimension for obtaining a long range of relations. Keeping a narrow kernel size in the other spatial dimension can extract local context information while avoiding interference from irrelevant regions. To focus on the changing features, we design the image difference enhancement (IDE) method, which decreases the distance between invariant features and enlarges the distance between changing features. In addition, we design the cross-channel interaction (CNI) strategy, which models the relationship between feature map channels and extracts feature representations through local cross-channel interaction. On the CDD, WHU-CD, and LEVIR-CD public datasets, we conducted comprehensive experiments. According to the experimental results, our proposed LRDE-Net performs better than other state-of-the-art change detection techniques, and the change regions are more precisely identified. It can better cope with seasonal changes, light intensity, and other pseudochange disturbances.

show abstract

Section: Related Workmentioning

confidence: 99%

LRDE-Net: Large Receptive Field and Image Difference Enhancement Network for Remote Sensing Images Change Detection

Li,

Wang,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

show abstract

“…Immediately after, we use the DTT decoder to reweight the original features based on the generated tokens to obtain the refined features considering the dual-temporal contextual relationships. While some works, such as [21][22][23], employ transformers based on cross-attention for change detection, their proposed cross-attention merely involves the straightforward calculation of attention matrices using the query (Q) from another temporal phase and the key (K) from the current temporal phase. This approach fails to adequately model the non-local structural relationships depicted in Figure 1.…”

Section: Introductionmentioning

confidence: 99%

DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Chen,

Jiang,

Zhou

2024

Remote Sensing

Self Cite

View full text Add to dashboard Cite

Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results often fail to capture target contours accurately. To address these issues, we propose a novel transformer-based hybrid network. In this study, we analyze the structural relationship in bi-temporal images and introduce a cross-attention-based transformer to model this relationship. First, we use a tokenizer to express the high-level features of the bi-temporal image into several semantic tokens. Then, we use a dual temporal transformer (DTT) encoder to capture dense spatiotemporal contextual relationships among the tokens. The features extracted at the coarse scale are refined into finer details through the DTT decoder. Concurrently, we input the backbone’s low-level features into a contour-guided graph interaction module (CGIM) that utilizes joint attention to capture semantic relationships between object regions and the contour. Then, we use the feature pyramid decoder to integrate the multi-scale outputs of the CGIM. The convolutional block attention modules (CBAMs) employ channel and spatial attention to reweight feature maps. Finally, the classifier discriminates change pixels and generates the final change map of the difference feature map. Several experiments have demonstrated that our model shows significant advantages over other methods in terms of efficiency, accuracy, and visual effects.

show abstract

“…For instance, simple skip connections in encoder and decoder features can lead to semantic gaps. Inspired by achievements in the field of medical image segmentation [27][28][29], this paper introduces a Dual Cross-Attention module (DCA) based on the UNET architecture, incorporating Channel Cross-Attention (CCA) and Spatial Cross-Attention (SCA) mechanisms. This DCA module adaptively captures channel and spatial dependencies between multi-scale encoder features in sequence to address the semantic gaps between encoder and decoder features in the UNET architecture.…”

Section: Introductionmentioning

confidence: 99%

Enhanced Wind Field Spatial Downscaling Method Using UNET Architecture and Dual Cross-Attention Mechanism

Liu,

Shi,

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Before 2008, China lacked high-coverage regional surface observation data, making it difficult for the China Meteorological Administration Land Data Assimilation System (CLDAS) to directly backtrack high-resolution, high-quality land assimilation products. To address this issue, this paper proposes a deep learning model named UNET_DCA, based on the UNET architecture, which incorporates a Dual Cross-Attention module (DCA) for multiscale feature fusion by introducing Channel Cross-Attention (CCA) and Spatial Cross-Attention (SCA) mechanisms. This model focuses on the near-surface 10-meter wind field and achieves spatial downscaling from 6.25 km to 1 km. We conducted training and validation using data from 2020–2021, tested with data from 2019, and performed ablation experiments to validate the effectiveness of each module. We compared the results with traditional bilinear interpolation methods and the SNCA-CLDASSD model. The experimental results show that the UNET-based model outperforms SNCA-CLDASSD, indicating that the UNET-based model captures richer information in wind field downscaling compared to SNCA-CLDASSD, which relies on sequentially stacked CNN convolution modules. UNET_CCA and UNET_SCA, incorporating cross-attention mechanisms, outperform UNET without attention mechanisms. Furthermore, UNET_DCA, incorporating both Channel Cross-Attention and Spatial Cross-Attention mechanisms, outperforms UNET_CCA and UNET_SCA, which only incorporate one attention mechanism. UNET_DCA performs best on the RMSE, MAE, and COR metrics (0.40 m/s, 0.28 m/s, 0.93), while UNET_DCA_ars, incorporating more auxiliary information, performs best on the PSNR and SSIM metrics (29.006, 0.880). Evaluation across different methods indicates that the optimal model performs best in valleys, followed by mountains, and worst in plains; it performs worse during the day and better at night; and as wind speed levels increase, accuracy decreases. Overall, among various downscaling methods, UNET_DCA and UNET_DCA_ars effectively reconstruct the spatial details of wind fields, providing a deeper exploration for the inversion of high-resolution historical meteorological grid data.

show abstract

DCAT: Dual Cross-Attention-Based Transformer for Change Detection

Cited by 10 publications

References 66 publications

LRDE-Net: Large Receptive Field and Image Difference Enhancement Network for Remote Sensing Images Change Detection

LRDE-Net: Large Receptive Field and Image Difference Enhancement Network for Remote Sensing Images Change Detection

DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Enhanced Wind Field Spatial Downscaling Method Using UNET Architecture and Dual Cross-Attention Mechanism

Contact Info

Product

Resources

About