2023
DOI: 10.1109/jstars.2022.3230835
|View full text |Cite
|
Sign up to set email alerts
|

Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification

Abstract: Remote sensing images (RSIs) are characterized by complex spatial layouts and ground object structures. ViT can be a good choice for scene classification owing to the ability to capture long-range interactive information between patches of input images. However, due to the lack of some inductive biases inherent to CNNs, such as locality and translation equivariance, ViT cannot generalize well when trained on insufficient amounts of data. Compared with training ViT from scratch, transferring a large-scale pre-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 49 publications
0
14
0
Order By: Relevance
“…E.g., in the study (Ma et al ., 2022), the overlapped categories for AID include 5 pairs and that for NWPU seems to be more than 10. In the study (Bi et al ., 2023), the whole separation of categories for AID is fine, but the pairs of overlapped categories for NWPU are 8. In general, the comparison of t-SNE results is still consistent with the methods' OA performance.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…E.g., in the study (Ma et al ., 2022), the overlapped categories for AID include 5 pairs and that for NWPU seems to be more than 10. In the study (Bi et al ., 2023), the whole separation of categories for AID is fine, but the pairs of overlapped categories for NWPU are 8. In general, the comparison of t-SNE results is still consistent with the methods' OA performance.…”
Section: Resultsmentioning
confidence: 99%
“…In general, the comparison of t-SNE results is still consistent with the methods' OA performance. Namely, as evaluated by OAs, MBC-Net is the best, while the last one (Bi et al ., 2023) ranks second. Therefore, based on the above evidence, it validates the effectiveness and superiority of feature representations obtained by MBC-Net.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Additionally, Sha and Li (2022) introduced a multiple-instance learning method utilising ViTs. Bi et al (2023), X. and Y. proposed innovative ViT-based approaches via contrastive learning. However, despite the increased number of parameters, these ViT-based approaches typically achieve average accuracy when compared to certain CNN-based methods.…”
Section: Rel Ated Work Smentioning
confidence: 99%