2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01054
|View full text |Cite
|
Sign up to set email alerts
|

A-ViT: Adaptive Tokens for Efficient Vision Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 125 publications
(47 citation statements)
references
References 19 publications
0
33
0
Order By: Relevance
“…• A-ViT [22]: A-ViT uses an adaptive token discarding framework to expedite inference in vision transformers.…”
Section: B Comparison With Sota Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…• A-ViT [22]: A-ViT uses an adaptive token discarding framework to expedite inference in vision transformers.…”
Section: B Comparison With Sota Methodsmentioning
confidence: 99%
“…Evo-ViT [21] uses a slow-fast token evolution strategy to update the attractive and unattractive patch groups. A-ViT [22] augments the vision transformer block with adaptive halting modules that compute a halting probability to prune unnecessary tokens.…”
Section: B Patch Slimmingmentioning
confidence: 99%
“…Various techniques have been developed to train ViT efficiently. Among them, token sparsification (Pan et al, 2021;Rao et al, 2021;Tang et al, 2022;Yin et al, 2022) removes redundant tokens (image patches) of data to improve the computational complexity while maintaining a comparable learning performance. For example, Under what conditions does a Transformer achieve satisfactory generalization?…”
Section: Introductionmentioning
confidence: 99%
“…(3) We demonstrate the efficacy of GradMDM on multiple dynamic neural networks across various datasets, where GradMDM effectively increases computations with less perceptible perturbations across all settings. [19], [20], [21], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33] achieve efficiency while maintaining good accuracy by dynamically adjusting model architectures to allocate appropriate computation conditioned on each input sample. This reduces redundant computations on those "easy" samples, improving the inference efficiency [18].…”
Section: Introductionmentioning
confidence: 99%
“…SkipNet [19] assigns a gating module to each convolutional block in CNNs, which decides whether to execute or skip it, and is trained via reinforcement learning. Dynamic width networks [21], [32], [34] selectively activate multiple components within the arXiv:2304.06724v1 [cs.CR] 1 Apr 2023 same layer, such as channels and neurons, based on each instance. Early studies [35], [36], [37] achieve dynamic width by adaptively controlling the activation of neurons or parameters, e.g., via stochastic gating units [35], [36].…”
Section: Introductionmentioning
confidence: 99%