Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2023
DOI: 10.1007/978-3-031-25063-7_42
|View full text |Cite
|
Sign up to set email alerts
|

Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration

Abstract: In smartphones and compact cameras, the Image Signal Processor (ISP) transforms the RAW sensor image into a human-readable sRGB image. Most popular superresolution methods depart from a sRGB image and upscale it further, improving its quality. However, modeling the degradations in the sRGB domain is complicated because of the non-linear ISP transformations. Despite this known issue, only a few methods work directly with RAW images and tackle real-world sensor degradations.We tackle blind image super-resolution… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 62 publications
(30 citation statements)
references
References 112 publications
0
15
0
Order By: Relevance
“…Originally, the Transformer [19] was designed for Natural Language Processing (NLP) applications, which has achieved state-of-the-art performance and become the de-facto standard solution for many NLP tasks [20], such as the cutting-edge and high-profile Chat Generative Pre-trained Transformer (Chat-GPT) [37]. Inspired by the great success of the Transformer in NLP, the Vision Transformer (ViT), a variant of the Transformer designed specifically for image processing, has recently gained much popularity in the computer vision community [18,38]. Different from the CNN structure, the ViT converts 2-D images into 1-D sequences first and then applies the selfattention mechanism for feature extraction.…”
Section: Vision Transformermentioning
confidence: 99%
See 2 more Smart Citations
“…Originally, the Transformer [19] was designed for Natural Language Processing (NLP) applications, which has achieved state-of-the-art performance and become the de-facto standard solution for many NLP tasks [20], such as the cutting-edge and high-profile Chat Generative Pre-trained Transformer (Chat-GPT) [37]. Inspired by the great success of the Transformer in NLP, the Vision Transformer (ViT), a variant of the Transformer designed specifically for image processing, has recently gained much popularity in the computer vision community [18,38]. Different from the CNN structure, the ViT converts 2-D images into 1-D sequences first and then applies the selfattention mechanism for feature extraction.…”
Section: Vision Transformermentioning
confidence: 99%
“…Benefitting from this design, the ViT-based algorithms have demonstrated obvious superiority over CNNs and obtained numerous breakthroughs in fundamental vision tasks, such as image classification [20,21,23], semantic segmentation [24,39,40] and super-resolution [18,41,28,42].…”
Section: Vision Transformermentioning
confidence: 99%
See 1 more Smart Citation
“…SwinIR [30] proposes an efficient transformer-based SR model which fully explores the swin transformer structure, and it outperforms pure convolution networks with fewer parameters and FLOPs. Swin2SR [8] further improves the network structure by introducing the SwinV2 attention, and proposes auxiliary loss and high-frequency loss for the compressed images. However, all these methods cannot satisfy real-time performance.…”
Section: Efficient Image Super-resolutionmentioning
confidence: 99%
“…18 Much of the early work on super-resolution dealt with scenarios without compression. However recent works 19,20 address the more realistic scenario where a downsampled image or video is compressed for transmission.…”
Section: Related Workmentioning
confidence: 99%