2023
DOI: 10.1109/tpami.2022.3152247
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Vision Transformer

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
172
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 926 publications
(173 citation statements)
references
References 110 publications
1
172
0
Order By: Relevance
“…Vision transformers In computer vision, convolutional networks have become by far the dominating model class over the last decade. Vision transformers [33] break with the long tradition of using convolutions and are rapidly gaining traction [56]. We find that the best vision transformer (ViT-L trained on 14M images) even exceeds human OOD accuracy (Figure 1a shows the average across 17 datasets).…”
Section: Modelsmentioning
confidence: 93%
“…Vision transformers In computer vision, convolutional networks have become by far the dominating model class over the last decade. Vision transformers [33] break with the long tradition of using convolutions and are rapidly gaining traction [56]. We find that the best vision transformer (ViT-L trained on 14M images) even exceeds human OOD accuracy (Figure 1a shows the average across 17 datasets).…”
Section: Modelsmentioning
confidence: 93%
“…Inspired by the major success of transformer architectures in the field of NLP, researchers have recently applied transformer to computer vision (CV) tasks [13]. Chen et al [6] trained a sequence transformer to auto-regressively predict pixels, achieving results comparable to CNNs on image classification tasks.…”
Section: Vision Transformermentioning
confidence: 99%
“…Finally, a full connection layer is connected to complete the ViT image classification task. [50]. Meanwhile, the main highlight of ViT is to show that it does not rely on convolutional neural networks and can also achieve good results in image classification [17].…”
Section: Visual Transformersmentioning
confidence: 99%