2021
DOI: 10.48550/arxiv.2109.04454
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ConvMLP: Hierarchical Convolutional MLPs for Vision

Abstract: MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
28
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(29 citation statements)
references
References 39 publications
1
28
0
Order By: Relevance
“…CycleMLP (Chen et al, 2021b) takes pseudo-kernels and sample tokens from different spatial locations for mixing. ConvMLP (Li et al, 2021) incorporates convolution layers and a pyramid structure to achieve local token mixing. Hire-MLP (Guo et al, 2021) rearranges tokens across local regions to gain performance and computational efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…CycleMLP (Chen et al, 2021b) takes pseudo-kernels and sample tokens from different spatial locations for mixing. ConvMLP (Li et al, 2021) incorporates convolution layers and a pyramid structure to achieve local token mixing. Hire-MLP (Guo et al, 2021) rearranges tokens across local regions to gain performance and computational efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…Based on these pioneering studies, concurrent papers [5,11,18,23,25,28,44,56,57] address new issues and potentials in MLP-like architectures. VisionPermutator [18] effectively preserves spatial dimensions of the input tokens by separately processing token representation along the different dimensions.…”
Section: Related Workmentioning
confidence: 99%
“…Beyond the well established realm of CNNs and transformer, MLP-Mixer [43] and Synthesizer [41] propose a new architecture that exclusively uses MLPs. Based on these pioneering studies [41,43], concurrent works [5,18,23,44] have been recently introduced. For instance, ResMLP [44] emphasizes that MLP-like architectures can take inputs of arbitrary length.…”
Section: Introductionmentioning
confidence: 99%
“…There is another special variant that uses only channel projection, called ConvMLP [101]. Its authors call it a hierarchical Convolutional MLP which is a light-weight, stage-wise, co-design of convolution layers.…”
Section: Yu Et Al From Baidumentioning
confidence: 99%