2023
DOI: 10.1007/s11263-022-01739-w
|View full text |Cite
|
Sign up to set email alerts
|

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
27
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 94 publications
(28 citation statements)
references
References 64 publications
1
27
0
Order By: Relevance
“…To entail a fair comparison, we keep the same data augmentation and training settings as the other vision transformers as far as possible. The competitors are all competitive vision transformers, including DeiT [2], PVT [3], T2T-ViT [19], TNT [20], CViT [21], Twins [22], Swin [4], NesT [23], CvT [9], ViL [24], CAT [5], ResT [25], TransCNN [26], Shuffle [27], BoTNet [28], Re-gionViT [29], ViTAEv2 [30], MPViT [31], ScalableViT [32], DaViT [33], and CoAtNet [34].…”
Section: Methodsmentioning
confidence: 99%
“…To entail a fair comparison, we keep the same data augmentation and training settings as the other vision transformers as far as possible. The competitors are all competitive vision transformers, including DeiT [2], PVT [3], T2T-ViT [19], TNT [20], CViT [21], Twins [22], Swin [4], NesT [23], CvT [9], ViL [24], CAT [5], ResT [25], TransCNN [26], Shuffle [27], BoTNet [28], Re-gionViT [29], ViTAEv2 [30], MPViT [31], ScalableViT [32], DaViT [33], and CoAtNet [34].…”
Section: Methodsmentioning
confidence: 99%
“…A possible reason is that MSAs at the end of each stage act as spatial smoothing and aggregation [20], thus neglecting details unavoidably. To this end, we propose the I-PLDE module, a parallel branch emphasizing local detail on the top of the vertical hybrid design, inspired by the "divide-and-conquer" idea in [30,32].I-PLDE consists of a 1x1 convolution to match hidden dimension with its parallel branch, three stacked depth-wise convolution layers and an window embedding operation. SiLU is used for non-linear activation following the convention in [30,32].…”
Section: Dudornext: Towards Hybridizing Cnns and Vitsmentioning
confidence: 99%
“…To this end, we propose the I-PLDE module, a parallel branch emphasizing local detail on the top of the vertical hybrid design, inspired by the "divide-and-conquer" idea in [30,32].I-PLDE consists of a 1x1 convolution to match hidden dimension with its parallel branch, three stacked depth-wise convolution layers and an window embedding operation. SiLU is used for non-linear activation following the convention in [30,32]. The output of I-PLDE F CE i is added after W-MSA for preserving details.…”
Section: Dudornext: Towards Hybridizing Cnns and Vitsmentioning
confidence: 99%
See 2 more Smart Citations