2021
DOI: 10.48550/arxiv.2106.03106
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uformer: A General U-Shaped Transformer for Image Restoration

Abstract: In this paper, we present Uformer, an effective and efficient Transformer-based architecture, in which we build a hierarchical encoder-decoder network using the Transformer block for image restoration. Uformer has two core designs to make it suitable for this task. The first key element is a local-enhanced window Transformer block, where we use non-overlapping window-based self-attention to reduce the computational requirement and employ the depth-wise convolution in the feedforward network to further improve … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
104
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 72 publications
(124 citation statements)
references
References 81 publications
0
104
0
Order By: Relevance
“…Recently, Transformer-based models [38,67,83,90] have achieved promising performance in various vision tasks, such as image recognition [6,14,21,39,[50][51][52]52,75,90] and image restoration [11,40,89]. Some methods have tried to use Transformer for video modelling by extending the attention mechanism to the temporal dimension [2,3,38,53,60].…”
Section: Vision Transformermentioning
confidence: 99%
“…Recently, Transformer-based models [38,67,83,90] have achieved promising performance in various vision tasks, such as image recognition [6,14,21,39,[50][51][52]52,75,90] and image restoration [11,40,89]. Some methods have tried to use Transformer for video modelling by extending the attention mechanism to the temporal dimension [2,3,38,53,60].…”
Section: Vision Transformermentioning
confidence: 99%
“…Such a strategy will inevitably cause patch boundary artifacts when applied on larger images using crop-ping [14]. Local-attention based Transformers [51,95] ameliorate this issue, but they are also constrained to have limited sizes of receptive field, or to lose non-locality [23,91], which is a compelling property of Transformers and MLP models relative to hierarchical CNNs.…”
Section: Enhancementmentioning
confidence: 99%
“…Advanced components developed for high-level vision tasks have been brought into lowlevel vision tasks as well. Residual and dense connections [42,93,93,117,118], the multi-scale feature learning [19,40,95], attention mechanisms [64,89,107,108,118],…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations