2021
DOI: 10.48550/arxiv.2105.02358
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Abstract: Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 42 publications
(41 citation statements)
references
References 78 publications
0
32
0
Order By: Relevance
“…However, since ViT lacks intrinsic inductive bias in modeling local visual structures, it indeed learns the IB from amounts of data implicitly. Following works along this direction are to simplify the model structures with fewer intrinsic IBs and directly learn them from large scale data [42,63,64,18,15], which have achieved promising results and been studied actively. Another direction is to leverage the intrinsic IB from CNNs to facilitate the training of vision transformers, e.g., using less training data or shorter training schedules.…”
Section: Vision Transformers With Learned Ibmentioning
confidence: 99%
“…However, since ViT lacks intrinsic inductive bias in modeling local visual structures, it indeed learns the IB from amounts of data implicitly. Following works along this direction are to simplify the model structures with fewer intrinsic IBs and directly learn them from large scale data [42,63,64,18,15], which have achieved promising results and been studied actively. Another direction is to leverage the intrinsic IB from CNNs to facilitate the training of vision transformers, e.g., using less training data or shorter training schedules.…”
Section: Vision Transformers With Learned Ibmentioning
confidence: 99%
“…To avoid the drawbacks of the aforementioned learning architectures, and, with the aim of achieving better results at lower computational cost, very recently, four architectures were proposed almost simultaneously [16,7,12,17]. Their common aim is to take full advantage of linear layers.…”
Section: Four Recent Architecturesmentioning
confidence: 99%
“…External attention [7] reveals the relation between selfattention and linear layers. It first simplifies self-attention as in Eq.…”
Section: External Attentionmentioning
confidence: 99%
See 2 more Smart Citations