2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00461
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Abstract: Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
31
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 81 publications
(33 citation statements)
references
References 40 publications
0
31
0
Order By: Relevance
“…The information loss problem during extracting features has not been explored yet, which is an important component in video prediction systems. To solve this problem, Yu et al [3] proposed a conditionally reversible network (CrevNet) that uses reversible architectures for information protection and Jin et al [10] employed additional high-frequency information for better prediction. However, in above works, the encoded information can only be indirectly utilized by decoders through predictive memories and more direct interactions between encoders and decoders are needed to be organized.…”
Section: Introductionmentioning
confidence: 99%
“…The information loss problem during extracting features has not been explored yet, which is an important component in video prediction systems. To solve this problem, Yu et al [3] proposed a conditionally reversible network (CrevNet) that uses reversible architectures for information protection and Jin et al [10] employed additional high-frequency information for better prediction. However, in above works, the encoded information can only be indirectly utilized by decoders through predictive memories and more direct interactions between encoders and decoders are needed to be organized.…”
Section: Introductionmentioning
confidence: 99%
“…To reduce the computation load, Yu et al [17] built a Conditionally Reversible Network (CrevNet) and have achieved satisfactory results in next-frame prediction task, however, the quality degradation problem in multi-frame prediction task is still needed to be solved. To generate satisfactory results in multi-frame prediction task, Jin et al [18] utilized multi-frequency information of videos to predict video frames with fine details. However, the computation load is still prohibitively high.…”
Section: Introductionmentioning
confidence: 99%
“…There is a method called video prediction. Many works (Straka et al, 2020;Hu et al, 2020;Jin et al, 2020;Mathieu et al, 2015;Finn et al, 2016;Oh et al, 2015;Lotter et al, 2016;Lotter et al, 2020;Qiu et al, 2019) have proposed different network structures to predict human movement, digital movement and robot movement. When some pixels are occluded, the unoccluded pixels can still be used for prediction.…”
Section: Introductionmentioning
confidence: 99%