2022
DOI: 10.48550/arxiv.2201.02767
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

QuadTree Attention for Vision Transformers

Abstract: Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency. However, their quadratic computational complexity poses a major obstacle for applying them to vision tasks requiring dense predictions, such as object detection, feature matching, stereo, etc. We introduce QuadTree Attention, which reduces the computational complexity from quadratic to linear. Our quadtree transformer builds token pyramids and computes attention in a coarse-to-fine manner. At … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(25 citation statements)
references
References 28 publications
0
25
0
Order By: Relevance
“…To solve this issue, transformer-based detector-free methods have emerged as more robust alternatives, demonstrating impressive matching abilities in texture-less regions [43,18,47,57,4]. However, the high computational cost of attention limits transformer-based methods to 'semi-dense' matching, where source matching points are spaced apart at intervals of coarse feature space, as shown in Fig.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…To solve this issue, transformer-based detector-free methods have emerged as more robust alternatives, demonstrating impressive matching abilities in texture-less regions [43,18,47,57,4]. However, the high computational cost of attention limits transformer-based methods to 'semi-dense' matching, where source matching points are spaced apart at intervals of coarse feature space, as shown in Fig.…”
Section: Methodsmentioning
confidence: 99%
“…Figure 1: QuadTree [47] (a,d) vs our CasMTR (b,c,e). Our method achieves more fine-grained matching pairs for both source and target images (b).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We refer to (Fisher, 2012) for an overview of the main variants. Adaptive quadrees have also been successfully introduced in Transformer architectures (Tang et al, 2022), suggesting that further techniques linked to Collages and fractal compression may be beneficial in this domain. Finally, images generated from IFSs have been used to construct artificial pretraining datasets for large vision models (Kataoka et al, 2020).…”
Section: Fractal Compressionmentioning
confidence: 99%