2021
DOI: 10.48550/arxiv.2105.15168
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Abstract: Transformers have offered a new methodology of designing neural networks for visual recognition. Compared to convolutional networks, Transformers enjoy the ability of referring to global features at each stage, yet the attention module brings higher computational overhead that obstructs the application of Transformers to process high-resolution visual data. This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a mes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(15 citation statements)
references
References 51 publications
0
15
0
Order By: Relevance
“…The Swin transformer [21] used a shifted window mechanism. Although the design served as an excellent backbone for the following works [3,5,10,13,38], the shifted window communication had an overhead of careful cyclic shift and padding. Shortly after, the work [5] proposed two models, namely Twins-PCPVT and Twins-SVT.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The Swin transformer [21] used a shifted window mechanism. Although the design served as an excellent backbone for the following works [3,5,10,13,38], the shifted window communication had an overhead of careful cyclic shift and padding. Shortly after, the work [5] proposed two models, namely Twins-PCPVT and Twins-SVT.…”
Section: Related Workmentioning
confidence: 99%
“…The global sub-sampled attention (GSA) module used strided convolution function to summarise the local tokens and apply MSA for global interaction. The MSGTr [13] and RegionViT [3] proposed a similar idea of interacting with a regional level representative tokens. We discuss these works in more detail in the next section to clarify the design differences.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations