2023
DOI: 10.3390/s23052385
|View full text |Cite
|
Sign up to set email alerts
|

Vision Transformers in Image Restoration: A Survey

Abstract: The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(15 citation statements)
references
References 110 publications
0
15
0
Order By: Relevance
“…Patch embeddings, along with classification label and positional embeddings are summed and fed into a transformer encoder for image recognition [ 44 ]. For visual representation presented in Fig 3 , which taken reference from the source [ 46 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Patch embeddings, along with classification label and positional embeddings are summed and fed into a transformer encoder for image recognition [ 44 ]. For visual representation presented in Fig 3 , which taken reference from the source [ 46 ].…”
Section: Methodsmentioning
confidence: 99%
“…RMSProp . Another optimization approach, root-mean-square propagation (RMSProp), has a learning rate adjusted for parameters [ 46 ]. The ’running average’ is calculated as follows ( Eq 12 ): …”
Section: Methodsmentioning
confidence: 99%
“…The RGB images captured in intelligent driving often feature complex urban lighting and a variety of targets, which is markedly different from indoor scenes or low-light datasets with no light sources or only a single weak light source. Therefore, we chose CNN [ 38 ] and Vision Transformer [ 39 ] as different baselines to address the varying complexities of degradation in low and high-frequency information. The degradation on the Illumination Map is linear grayscale attenuation, which can be processed with a smaller parameter CNN.…”
Section: Methodsmentioning
confidence: 99%
“…Self-attention is a module that uses image patches as input, transforms it then returns the value of the token as output. A popular transformer model that is used for image processing is called Visual Transformer (ViT) [11].…”
Section: Restormermentioning
confidence: 99%