2023
DOI: 10.1007/s10462-023-10595-0
|View full text |Cite
|
Sign up to set email alerts
|

A survey of the vision transformers and their CNN-transformer based variants

Asifullah Khan,
Zunaira Rauf,
Anabia Sohail
et al.
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(5 citation statements)
references
References 220 publications
0
3
0
Order By: Relevance
“…To validate the generalizability of the method proposed in this paper, land cover classification experiments were conducted on the Houston 2013 dataset. Our method was compared with traditional machine learning algorithms and state-of-the-art methods in the field of deep learning, including CCF [53], CoSpace [54], Co-CNN [55], FusAT-Net [56], ViT [57], S2FL [49], Spectral-Former [58], CCR-Net [2], MFT [52], and DIMNet [59]. The specific results are shown in Figure 16, where (a) displays the DSM of LiDAR data, (b) shows the heatmap, (c) represents the three-band color composite for HSI spectral information, (d) shows the train ground-truth map, (e) shows the test ground-truth map, and (f) illustrates the classification results, with good contrast post-reconstruction.…”
Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning
confidence: 99%
See 1 more Smart Citation
“…To validate the generalizability of the method proposed in this paper, land cover classification experiments were conducted on the Houston 2013 dataset. Our method was compared with traditional machine learning algorithms and state-of-the-art methods in the field of deep learning, including CCF [53], CoSpace [54], Co-CNN [55], FusAT-Net [56], ViT [57], S2FL [49], Spectral-Former [58], CCR-Net [2], MFT [52], and DIMNet [59]. The specific results are shown in Figure 16, where (a) displays the DSM of LiDAR data, (b) shows the heatmap, (c) represents the three-band color composite for HSI spectral information, (d) shows the train ground-truth map, (e) shows the test ground-truth map, and (f) illustrates the classification results, with good contrast post-reconstruction.…”
Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning
confidence: 99%
“…To validate the generalizability of the method proposed in this paper, land cover classification experiments were conducted on the Houston 2013 dataset. Our method was compared with traditional machine learning algorithms and state-of-the-art methods in the field of deep learning, including CCF [53], CoSpace [54], Co-CNN [55], FusAT-Net [56], ViT [57], S2FL [49], Spectral-Former [58], CCR-Net [2], MFT [52], and DIMNet [59]. The specific results are shown in Figure 16, where 4 for comparison, where the top outcomes are highlighted in bold.…”
Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning
confidence: 99%
“…Recently, transformers have succeeded in various application fields, such as natural language processing (Kalyan et al, 2022) and computer vision (Khan et al, 2023). Its attention mechanism, capable of learning connections between sequence elements, has led to the development of transformer-based models like autoformer (Wu et al, 2021) and PatchTST (Nie et al, 2022) for time series representation.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, transformers have succeeded in various application fields, such as natural language processing (Kalyan et al. , 2022) and computer vision (Khan et al. , 2023).…”
Section: Related Workmentioning
confidence: 99%
“…Transformer-based attentional mechanisms with deep semantic features have a larger sensory field; however, a larger downsampling factor results in a loss of positional information. In addition to the transformer-based self-attention mechanism used to form a feature map that focuses on interrelationships, attentional mechanisms include channel attention, pixel attention, multilevel attention, and other methods of focusing on key features [8,20].…”
Section: Channel and Spatial Attention Component (Csac)mentioning
confidence: 99%