Global–Local Transformer Network for HSI and LiDAR Data Joint Classification

Ding, Kexing; Lu, Ting; Fu, Wei; Li, Shutao; Ma, Fuyan

doi:10.1109/tgrs.2022.3216319

Cited by 38 publications

(8 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the one hand, WMM [35], online multiview deep forest (OMDF) [47], and Kronecker product (KP) [38] are compared as non-DL fusion methods to demonstrate the superiority of neural networks in feature extraction. On the other hand, five DL-based fusion methods, including two-branch CNN (t-CNN) [48], feature intersecting learningbased CNN (FIL-CNN) [49], cross channel reconstruction network (CCR-Net) [50], global-local Transformer (GLT) [51], and multi-modal fusion network (MFNet) [52], are chosen for comparison. Specifically, t-CNN, FIL-NN, CCR-Net, and MFNet are all based on CNN for feature extraction, while GLT is based on CNN and Transformer for feature extraction.…”

Section: Performance Comparisonmentioning

confidence: 99%

PolSAR-MPIformer: A Vision Transformer Based on Mixed Patch Interaction for Dual-Frequency PolSAR Image Adaptive Fusion Classification

Xin,

Li,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Vision Transformer (ViT) provides new ideas for polarization synthetic aperture radar (PolSAR) image classification due to its advantages in learning global-spatial information. However, the lack of local-spatial information within samples and correlation information among samples, as well as the complexity of network structure, limit the application of ViT in practice. In addition, dual-frequency PolSAR data provides rich information, but there are fewer related studies compared to single-frequency classification algorithms. In this paper, we adopt ViT as the basic framework, and propose a novel model based on mixed patch interaction for dual-frequency PolSAR image adaptive fusion classification (PolSAR-MPIformer). First, a mixed patch interaction (MPI) module is designed for feature extraction, which replaces the high-complexity self-attention in ViT with patch interaction intra-and inter-sample. Besides the global-spatial information learning within samples by ViT, the MPI module adds the learning of local-spatial information within samples and correlation information among samples, thereby obtaining more discriminative features through a low-complexity network. Subsequently, a dual-frequency adaptive fusion (DAF) module is constructed as the classifier of PolSAR-MPIformer. On the one hand, the attention mechanism is utilized in DAF to reduce the impact of speckle noise while preserving details. On the other hand, the DAF evaluates the classification confidence of each band and assigns different weights accordingly, which achieves reasonable utilization of the complementarity between dual-frequency data and improves classification accuracy. Experiments on four real dual-frequency PolSAR datasets substantiate the superiority of the proposed PolSAR-MPIformer over other state-of-the-art algorithms.

show abstract

Section: Performance Comparisonmentioning

confidence: 99%

PolSAR-MPIformer: A Vision Transformer Based on Mixed Patch Interaction for Dual-Frequency PolSAR Image Adaptive Fusion Classification

Xin,

Li,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

show abstract

“…For example, Zhuo et al [52] simultaneously utilized multiscale CNN and multihop GCN to capture multiscale features containing local-global structural relationships. A novel global-local transformer network [53] learns local spatial features using multiscale aggregated CNN and extracts global spectral sequence properties using ViT. Taking global spatial context into account, [54] learns discriminative spatial features by overcoming the limitation of the receptive field and develops a dual-view spectral aggregation model to capture short-and long-view spectral features.…”

Section: B Global-local Feature Extraction Network For Rs Image Proce...mentioning

confidence: 99%

Dual Attention-Based Global-Local Feature Extraction Network for Unsupervised Change Detection in PolSAR Images

Xu,

Li,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Due to the interference of multiplicative speckles, it is challenging to accurately detect changes in polarimetric synthetic aperture radar (PolSAR) images. Convolutional neural network (CNN) has been proven to learn rich local features from PolSAR data. However, convolution kernels with limited receptive fields have difficulty in exploring global information. Here, a dual-attention-based global-local feature extraction network (DA-GLN) is developed for unsupervised PolSAR image change detection (CD). First, we use fuzzy C-means clustering on the enhanced Shannon entropy difference image to automatically generate pseudo-labeled samples required for unsupervised CD. Subsequently, our DA-GLN utilizes a deep residual shrinkage network (DRSN) that incorporates channel attention mechanisms and soft-thresholding to weaken the influence of speckle noise and capture local features. Meanwhile, a pooling-based vision transformer (PiT) is adopted in DA-GLN to extract global features, which introduces pooling layers to complete self-attention spatial information interaction with higher efficiency than the visual transformer (ViT). Furthermore, a global-local constraint feature fusion (GLCFF) strategy is designed to effectively fuse local and global features. Finally, we employ a feature constraint-focal loss (FC-F loss) function including feature constraint loss and focal loss as the objective function of DA-GLN. Specifically, the feature constraint loss function is constructed to eliminate feature redundancy and fully exploit the complementarity between features, while the focal loss function is introduced to balance the impact of the inequality between changed and unchanged samples on the network. Numerical experiments on five real spaceborne PolSAR datasets demonstrate that our DA-GLN is more competitive than other state-of-the-art methods.

show abstract

“…Therefore, inspired by the classification of HSI, researchers have applied the fusion model of CNN and transformer to the joint classification task of HSI and LiDAR-DSM. Ding et al [25] introduced the Global-Local Transformer Network (GLT-Net), designed to capture the global-local cor-relation features from inputs, effectively enhancing classification outcomes. This method only concatenated features from HSI and LiDAR-DSM without deep information fusion learning.…”

Section: Introductionmentioning

confidence: 99%

Joint Classification of Hyperspectral and LiDAR Data Based on Adaptive Gating Mechanism and Learnable Transformer

Wang,

Sun,

Xiang

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Utilizing multi-modal data, as opposed to only hyperspectral image (HSI), enhances target identification accuracy in remote sensing. Transformers are applied to multi-modal data classification for their long-range dependency but often overlook intrinsic image structure by directly flattening image blocks into vectors. Moreover, as the encoder deepens, unprofitable information negatively impacts classification performance. Therefore, this paper proposes a learnable transformer with an adaptive gating mechanism (AGMLT). Firstly, a spectral–spatial adaptive gating mechanism (SSAGM) is designed to comprehensively extract the local information from images. It mainly contains point depthwise attention (PDWA) and asymmetric depthwise attention (ADWA). The former is for extracting spectral information of HSI, and the latter is for extracting spatial information of HSI and elevation information of LiDAR-derived rasterized digital surface models (LiDAR-DSM). By omitting linear layers, local continuity is maintained. Then, the layer Scale and learnable transition matrix are introduced to the original transformer encoder and self-attention to form the learnable transformer (L-Former). It improves data dynamics and prevents performance degradation as the encoder deepens. Subsequently, learnable cross-attention (LC-Attention) with the learnable transfer matrix is designed to augment the fusion of multi-modal data by enriching feature information. Finally, poly loss, known for its adaptability with multi-modal data, is employed in training the model. Experiments in the paper are conducted on four famous multi-modal datasets: Trento (TR), MUUFL (MU), Augsburg (AU), and Houston2013 (HU). The results show that AGMLT achieves optimal performance over some existing models.

show abstract

Global–Local Transformer Network for HSI and LiDAR Data Joint Classification

Cited by 38 publications

References 45 publications

PolSAR-MPIformer: A Vision Transformer Based on Mixed Patch Interaction for Dual-Frequency PolSAR Image Adaptive Fusion Classification

PolSAR-MPIformer: A Vision Transformer Based on Mixed Patch Interaction for Dual-Frequency PolSAR Image Adaptive Fusion Classification

Dual Attention-Based Global-Local Feature Extraction Network for Unsupervised Change Detection in PolSAR Images

Joint Classification of Hyperspectral and LiDAR Data Based on Adaptive Gating Mechanism and Learnable Transformer

Contact Info

Product

Resources

About