“…To entail a fair comparison, we keep the same data augmentation and training settings as the other vision transformers as far as possible. The competitors are all competitive vision transformers, including DeiT [2], PVT [3], T2T-ViT [19], TNT [20], CViT [21], Twins [22], Swin [4], NesT [23], CvT [9], ViL [24], CAT [5], ResT [25], TransCNN [26], Shuffle [27], BoTNet [28], Re-gionViT [29], ViTAEv2 [30], MPViT [31], ScalableViT [32], DaViT [33], and CoAtNet [34].…”