2022
DOI: 10.48550/arxiv.2201.12288
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VRT: A Video Restoration Transformer

Abstract: Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks longrange modelling ability. In this paper, we prop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(69 citation statements)
references
References 58 publications
0
69
0
Order By: Relevance
“…Although vision Transformer has shown its superiority on modeling long-range dependency [13,43], there are still many works demonstrating that the convolution can help Transformer achieve better visual representation [56,58,61,60,25]. Due to the impressive performance, Transformer has also been introduced for low-level vision tasks [5,54,37,29,3,62,28,26]. Specifically, [5] develops a ViT-style network and introduces multi-task pre-training for image processing.…”
Section: Vision Transformermentioning
confidence: 99%
See 1 more Smart Citation
“…Although vision Transformer has shown its superiority on modeling long-range dependency [13,43], there are still many works demonstrating that the convolution can help Transformer achieve better visual representation [56,58,61,60,25]. Due to the impressive performance, Transformer has also been introduced for low-level vision tasks [5,54,37,29,3,62,28,26]. Specifically, [5] develops a ViT-style network and introduces multi-task pre-training for image processing.…”
Section: Vision Transformermentioning
confidence: 99%
“…SwinIR [29] proposes an image restoration Transformer based on [36]. [3,28] introduce Transformer-based networks to video restoration. [26] adopts self-attention mechanism and multirelated-task pre-training strategy to further refresh the state-of-the-art of SR.…”
Section: Vision Transformermentioning
confidence: 99%
“…Attention-based network, i.e., Transformer, have shown great performance and gained much popularity in various high-level computer vision tasks [7,9,16,34,35,53,55]. Recently, Transformer has also been introduced for low-level vision and tends to learn global interactions to focus on enhancing details and important regions [8,11,31,32,52]. Chen et al [11] were the first propose to use Transformer-based backbone IPT for various image restoration problems.…”
Section: Related Workmentioning
confidence: 99%
“…Further, we apply a pyramid structure to improve the alignment on the top of the flow-guided DCN. On the other hand, the self-attention mechanism and Transformer have shown promising performance in most computer vision tasks [31,32,35]. Therefore, to better use the inter-frame information, we incorporate Swin Transformer blocks and groups in our architecture to capture both global and local contexts for long-range dependency modeling [32,35].…”
Section: Introductionmentioning
confidence: 99%
“…However, the RNN-based methods inevidently suffer from the vanishing gradient problem and have difficulty in capturing the long-range temporal dependencies. Recently, the emerging Transformer model has been applied in image and video restoration tasks (Cai et al, 2021b;Liang et al, 2022;Lin et al, 2022b;Cao et al, 2021;Cai et al, 2022). Nonetheless, the token-based selfattention module has enormous computational and memory cost in restoring long video sequence.…”
Section: Video Restorationmentioning
confidence: 99%