2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01174
|View full text |Cite
|
Sign up to set email alerts
|

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(17 citation statements)
references
References 27 publications
0
17
0
Order By: Relevance
“…In Tab. 1, G2SD is compared with 1) supervised methods including MobileNet-v3 [19], ResNet [15,48], DeiT [41,42], Swin Trasnformer [28] and ConvNeXt [29]; 2) self-supervised methods upon ViT-Small, like BEiT [4] and CAE [8]; and 3) distillation methods upon vanilla ViTs, like DeiT⚗ [41], DearKD [7], Manifold [21], MKD [27], SSTA [49] and DMAE [3]. G2SD achieves 82.5% top-1 accuracy, which outperforms CNNbased ConvNeXt by 0.4%, by using fewer parameters (22M vs. 29M).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In Tab. 1, G2SD is compared with 1) supervised methods including MobileNet-v3 [19], ResNet [15,48], DeiT [41,42], Swin Trasnformer [28] and ConvNeXt [29]; 2) self-supervised methods upon ViT-Small, like BEiT [4] and CAE [8]; and 3) distillation methods upon vanilla ViTs, like DeiT⚗ [41], DearKD [7], Manifold [21], MKD [27], SSTA [49] and DMAE [3]. G2SD achieves 82.5% top-1 accuracy, which outperforms CNNbased ConvNeXt by 0.4%, by using fewer parameters (22M vs. 29M).…”
Section: Resultsmentioning
confidence: 99%
“…One solution is to explicitly introduce convolutional operators to ViTs [31,50] to enhance the competitiveness compared to lightweight CNNs [19]. The other way is using large models act as teachers to transfer inductive bias to ViTs in the knowledge distillation fashion [7,41,50]. This study focuses on the latter.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, many algorithms have been proposed to improve the efficiency of vision transformers. Recent works demonstrate that some popular model compression methods such as network pruning [17,7,8,70], knowledge distillation [20,54,9], and quantization [46,51] can be applied to ViTs. Besides, other methods introduce CNN properties such as hierarchy and locality into the transformers to alleviate the burden of computing global attention [35,5].…”
Section: Efficient Vision Transformersmentioning
confidence: 99%
“…Despite its promising accuracy, the ViT [12] is a computational heavyweight. To address this issue, several algorithms have been proposed to improve the efficiency of vision transformers in different ways [5,6,21,28,37,40,52].…”
Section: Efficient Vision Transformersmentioning
confidence: 99%