2021
DOI: 10.48550/arxiv.2106.12620
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

Abstract: The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision. In spite of the impressive success made by transformers in a variety of vision tasks, it still suffers from heavy computation and intensive memory cost. To address this limitation, this paper presents an Interpretability-Aware REDundancy REDuction framework (IA-RED 2 ). We start by observing a large amount of redundant computation, mainly spent on uncorrelated input patches, and then introduc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 68 publications
0
1
0
Order By: Relevance
“…There are two major approaches: unstructured token sparsification and structured token division. The majority of works, including PatchSlim [41], TokenSparse [36], GlobalEncoder [39], IA-RED [32], and Tokenlearner [37], focus on the former. TokenLearner [37] uses an MLP to reduce the number of tokens.…”
Section: Dynamic Vision Transformermentioning
confidence: 99%
“…There are two major approaches: unstructured token sparsification and structured token division. The majority of works, including PatchSlim [41], TokenSparse [36], GlobalEncoder [39], IA-RED [32], and Tokenlearner [37], focus on the former. TokenLearner [37] uses an MLP to reduce the number of tokens.…”
Section: Dynamic Vision Transformermentioning
confidence: 99%