2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00041
|View full text |Cite
|
Sign up to set email alerts
|

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
244
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 790 publications
(244 citation statements)
references
References 30 publications
0
244
0
Order By: Relevance
“…non-overlapped). However, as pointed out by prior works like T2T [51], CrossViT [5], and PiT [19], using a stronger but still simple subnetwork for the tokenization could further improve the performance, especially for the smaller models. Thus, we adopt one convolutional layer with overlapped kernel and identical stride when generating local tokens, e.g.…”
Section: Rsa Regional Tokensmentioning
confidence: 99%
“…non-overlapped). However, as pointed out by prior works like T2T [51], CrossViT [5], and PiT [19], using a stronger but still simple subnetwork for the tokenization could further improve the performance, especially for the smaller models. Thus, we adopt one convolutional layer with overlapped kernel and identical stride when generating local tokens, e.g.…”
Section: Rsa Regional Tokensmentioning
confidence: 99%
“…Sliding window [44] 81.4 ----Shifted window [39] 81.3 42.2 39.1 41.5 Spatially Sep [12] 81.5 42.7 39.5 42.9 Sequential Axial [26] 81.5 40. 4 caused by other factors. The results are reported in Table 8a.…”
Section: Ablation Studymentioning
confidence: 99%
“…Comparison to Concurrent work. CrossViT [6] also utilizes different patch sizes (e.g., small and large) and dual-paths in a single-stage structure as ViT [16] and XCiT [17]. However, CrossViT's interactions between branches only occur through [CLS] tokens, while MPViT allows all patches of different scales to interact.…”
Section: Related Workmentioning
confidence: 99%