2022
DOI: 10.1007/978-3-031-19806-9_27
|View full text |Cite
|
Sign up to set email alerts
|

VSA: Learning Varied-Size Window Attention in Vision Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(14 citation statements)
references
References 31 publications
0
11
0
Order By: Relevance
“…In all 3 groups, our model consistently outperforms other compared ones. For example, for models in the smallest group (∼2G FLOPs), our BiFormer-T achieves 81.4% top-1 accuracy, 1.4% bet-ter than the most competitive QuadTree-b1 [38]. For models in the second group (∼4G FLOPs), BiFormer-S achieves 83.8% top-1 accuracy.…”
Section: Image Classification On Imagenet-1kmentioning
confidence: 97%
See 2 more Smart Citations
“…In all 3 groups, our model consistently outperforms other compared ones. For example, for models in the smallest group (∼2G FLOPs), our BiFormer-T achieves 81.4% top-1 accuracy, 1.4% bet-ter than the most competitive QuadTree-b1 [38]. For models in the second group (∼4G FLOPs), BiFormer-S achieves 83.8% top-1 accuracy.…”
Section: Image Classification On Imagenet-1kmentioning
confidence: 97%
“…The key observation which motivates our work is that the attentive region for different queries may differ significantly according to the visualization of pretrained ViT [15] and DETR [1]. As we achieve the goal of query-adaptive sparsity in a coarseto-fine manner, it shares some similarities with quad-tree attention [38]. Different from quad-tree attention, the goal of our bi-level routing attention is to locate a few most relevant key-value pairs, while quad-tree attention builds a to-ken pyramid and assembles messages from all levels of different granularities.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we consider integrating the Quadtree attention module into the skip connection. The Quadtree attention is an effective attention variant based on Transformer, 46 as shown in Figure 2. The module computes attention in a coarse-to-fine manner, and it is also able to capture long-range dependencies and local interactions, which can achieve better results and less computation in various vision tasks.…”
Section: Quadtree Attentionmentioning
confidence: 99%
“…DynamicViT [25] devises a lightweight prediction module to estimate the importance score of each token and determine which tokens to be pruned dynamically. QuadTree Attention [26] builds token pyramids and computes attention according to the attention scores. This method skips irrelevant regions in the fine level if their corresponding coarse-level regions are not promising, thereby reducing the computational complexity from quadratic to linear.…”
Section: Dynamic Token Generationmentioning
confidence: 99%