2021
DOI: 10.48550/arxiv.2107.08391
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AS-MLP: An Axial Shifted MLP Architecture for Vision

Abstract: An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for the information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features communication. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
73
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(75 citation statements)
references
References 46 publications
(55 reference statements)
1
73
0
1
Order By: Relevance
“…Recently, various variants are developed to achieve a better trade-off between accuracy and computational cost. For example, shift operation is introduced in S 2 -MLP [37] and AS-MLP [18] to exchange information across different tokens. Hire-MLP [8] present a hierarchical rearrangement operation, where the inner-region rearrangement and cross-region rearrangement capture local information and global context, respectively.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, various variants are developed to achieve a better trade-off between accuracy and computational cost. For example, shift operation is introduced in S 2 -MLP [37] and AS-MLP [18] to exchange information across different tokens. Hire-MLP [8] present a hierarchical rearrangement operation, where the inner-region rearrangement and cross-region rearrangement capture local information and global context, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, we adopt another setting following [22,18,8], i.e., multi-scale training strategy and "3x" schedule, based on Mask R-CNN [12] and Cascade Mask R-CNN [1].…”
Section: Object Detectionmentioning
confidence: 99%
“…This newly proposed network is called S2MLPv2 [94]. Unlike grouping and then performing the same shift operation for each group, the Axial Shifted MLP (AS-MLP) [95] performs different operations within each group which contains a few layers, like 3 or 5. For example, every three consecutive feature maps in the channel direction are, respectively, left-shifted, no-shifted, right-shifted, and so on.…”
Section: Yu Et Al From Baidumentioning
confidence: 99%
“…Specially, the whole architecture contains four stages, where the feature resolution reduces from H/4 × W/4 to H/32 × W/32 and the output dimension increases accordingly. The network based on this design includes Sparse MLP [91], HireMLP [100], AS-MLP [95] and CycleMLP [98]. Patch embedding can be equivalently achieved by a convolution layer with kernel size equal to stride equal to patch size.…”
Section: From Single-stage To Pyramidmentioning
confidence: 99%
See 1 more Smart Citation