Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification

Bi, Meiqiao; Wang, Minghua; Li, Zhi; Hong, Danfeng

doi:10.1109/jstars.2022.3230835

Cited by 24 publications

(14 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…E.g., in the study (Ma et al ., 2022), the overlapped categories for AID include 5 pairs and that for NWPU seems to be more than 10. In the study (Bi et al ., 2023), the whole separation of categories for AID is fine, but the pairs of overlapped categories for NWPU are 8. In general, the comparison of t-SNE results is still consistent with the methods' OA performance.…”

Section: Resultsmentioning

confidence: 99%

“…In general, the comparison of t-SNE results is still consistent with the methods' OA performance. Namely, as evaluated by OAs, MBC-Net is the best, while the last one (Bi et al ., 2023) ranks second. Therefore, based on the above evidence, it validates the effectiveness and superiority of feature representations obtained by MBC-Net.…”

Section: Resultsmentioning

confidence: 99%

“…To find effective roadmaps, different learning paradigms are being developed presently, such as self-supervised learning (Wang et al, 2022c), knowledge distillation (Chen et al, 2023;Luo et al, 2023) and so on. As applications for RSI-SC, Xu et al (2022a) and Bi et al (2023) Based on all the aforementioned ideas, this work proposes the MBC-Net for RSI-SC. The reasons and differences from previous studies are listed below:…”

Section: Related Workmentioning

confidence: 99%

“…As applications for RSI-SC, Xu et al. (2022a) and Bi et al. (2023) also proposed two different methods recently: using a ViT teacher to supervise a CNN's learning or applying contrastive learning to a ViT.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

MBC-Net: long-range enhanced feature fusion for classifying remote sensing images

Song

2023

IJICC

View full text Add to dashboard Cite

PurposeClassification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.Design/methodology/approachThis work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.FindingsExtensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.Originality/valueMBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

MBC-Net: long-range enhanced feature fusion for classifying remote sensing images

Song

2023

IJICC

View full text Add to dashboard Cite

show abstract

“…Additionally, Sha and Li (2022) introduced a multiple-instance learning method utilising ViTs. Bi et al (2023), X. and Y. proposed innovative ViT-based approaches via contrastive learning. However, despite the increased number of parameters, these ViT-based approaches typically achieve average accuracy when compared to certain CNN-based methods.…”

Section: Rel Ated Work Smentioning

confidence: 99%

Quantitative regularization in robust vision transformer for remote sensing image classification

Song,

Yuan,

Ouyang

et al. 2024

The Photogrammetric Record

View full text Add to dashboard Cite

Vision Transformers (ViTs) are exceptional at vision tasks. However, when applied to remote sensing images (RSIs), existing methods often necessitate extensive modifications of ViTs to rival convolutional neural networks (CNNs). This requirement significantly impedes the application of ViTs in geosciences, particularly for researchers who lack the time for comprehensive model redesign. To address this issue, we introduce the concept of quantitative regularization (QR), designed to enhance the performance of ViTs in RSI classification. QR represents an effective algorithm that adeptly manages domain discrepancies in RSIs and can be integrated with any ViTs in transfer learning. We evaluated the effectiveness of QR using three ViT architectures: vanilla ViT, Swin‐ViT and Next‐ViT, on four datasets: AID30, NWPU45, AFGR50 and UCM21. The results reveal that our Next‐ViT model surpasses 39 other advanced methods published in the past 3 years, maintaining robust performance even with a limited number of training samples. We also discovered that our ViT and Swin‐ViT achieve significantly higher accuracy and robustness compared to other methods using the same backbone. Our findings confirm that ViTs can be as effective as CNNs for RSI classification, regardless of the dataset size. Our approach exclusively employs open‐source ViTs and easily accessible training strategies. Consequently, we believe that our method can significantly lower the barriers for geoscience researchers intending to use ViT for RSI applications.

show abstract

A graph attention network with contrastive learning for temporal review-based recommendations

Yang,

Xiao,

Zheng

et al. 2024

Applied Soft Computing

View full text Add to dashboard Cite

Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification

Cited by 24 publications

References 49 publications

MBC-Net: long-range enhanced feature fusion for classifying remote sensing images

MBC-Net: long-range enhanced feature fusion for classifying remote sensing images

Quantitative regularization in robust vision transformer for remote sensing image classification

A graph attention network with contrastive learning for temporal review-based recommendations

Contact Info

Product

Resources

About