2022
DOI: 10.48550/arxiv.2201.09792
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Patches Are All You Need?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
86
0
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(114 citation statements)
references
References 14 publications
1
86
0
2
Order By: Relevance
“…ConvMixer [90] uses up to 9×9 convolutions to replace the "mixer" component of ViTs [35] or MLPs [87,88]. MetaFormer [108] suggests pooling layer is an alternate to self-attention.…”
Section: Concurrent Workmentioning
confidence: 99%
“…ConvMixer [90] uses up to 9×9 convolutions to replace the "mixer" component of ViTs [35] or MLPs [87,88]. MetaFormer [108] suggests pooling layer is an alternate to self-attention.…”
Section: Concurrent Workmentioning
confidence: 99%
“…Every convolution is followed by a ReLU activation and BatchNorm (BN). It has been demonstrated that the depthwise separable convolution works best with the largesized convolution [17]. To avoid gradient vanishing problem in deeper layers, each DC layer is designed by using the skip connection method.…”
Section: Deeper Convolution Blockmentioning
confidence: 99%
“…This work extracts large context information for matching via leveraging recent advances in Vision Transformers [11,14,29]. Methods leveraging Transformers' ability of modeling long-term dependencies have outperformed convolutional neural networks in various high-level computer vision tasks [14,43]. Inspired by these, Jiang et al [26] introduced an attentionbased module to resolve occlusions for optical flow estimation.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, POLA can be viewed as a generalization of per pixel overlapping attention that has been explored in [19,34]. Compared with the per-pixel one, POLA enjoys at least three advantages: 1) consumes less memory, 2) can be efficiently implemented in existing deep learning platforms, and 3) arranges features by patch, which may provide better performance as suggested in recent research [14,29,43].…”
Section: Attention In Transformermentioning
confidence: 99%