Building an End-to-End Spatial-Temporal Convolutional Network for Video Super-Resolution

Guo, Jun; Chao, Hongyang

doi:10.1609/aaai.v31i1.11228

Cited by 30 publications

(5 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, many Transformer variants [67]- [69] have also emerged in the video deblurring domain. It should be pointed out that ConvLSTM is widely used in video deblurring networks [70], [71] but other ConvRNNs (ConvLSTM variants) have not been adopted for video deblurring.…”

Section: B Video Deblurring Modelsmentioning

confidence: 99%

DB-RNN: An RNN for Precipitation Nowcasting Deblurring

Ma,

Zhang,

Liu

2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Precipitation nowcasting based on artificial intelligence has garnered widespread attention in the meteorological and computer communities in recent years. While new models are continuously proposed to refresh the forecasting performance, the problem of gradual blurring of forecast maps as the forecast period extends is still serious. Most models use the mean loss and the recursive prediction structure (such as MS-RNN). The mean loss always results in an average of future states, visually appearing as a blur. The recursive prediction method brings the accumulation of error (blur), causing the error (blur) of long-term predictions to increase exponentially. In this study, we add the adversarial loss and gradient loss to penalize the network to ease the blur caused by the averaging loss, and we introduce an additional deblurring network (composed of MS-RNN) behind the forecasting network (composed of MS-RNN) to alleviate the blur caused by the recursive structure, which reduces the blur of the current frame and then recursively and incrementally reduces the blur of subsequent frames. We name the proposed model DB-RNN, which can slow down the error accumulation and alleviate the blurring dilemma. Like MS-RNN, DB-RNN is compatible with multiple RNN models, such as ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, MotionRNN, PrecipLSTM, etc. Experiments on two large radar datasets named HKO-7 and DWD-12 indicate that DB-RNN's predictions are more accurate and clear than those from MS-RNN.

show abstract

Section: B Video Deblurring Modelsmentioning

confidence: 99%

DB-RNN: An RNN for Precipitation Nowcasting Deblurring

Ma,

Zhang,

Liu

2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

show abstract

“…Thus, the VSR technique is divided into two categories depending on the ways of the utilization of inter-frame information: 5 (1) method without alignment, such as Refs. 6 and 7. The non-local residual block is applied to capture long-term spatio-temporal correlations in Ref.…”

Section: Related Workmentioning

confidence: 99%

“…The non-local residual block is applied to capture long-term spatio-temporal correlations in Ref. 6, and Guo and Chao 7 extract inter-frame temporal information by long short term memory (LSTM). (2) Method with alignment, such as Refs.…”

Section: Related Workmentioning

confidence: 99%

Optical flow-free generative and adversarial network: generative and adversarial network-based video super-resolution method by optical flow-free motion estimation and compensation

Fang,

Bian,

Han

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

.Except for recovering the image detail texture information, the main difference between video super-resolution (VSR) and single-image super-resolution (SR) is that VSR focuses on alleviating the deficiency of temporal coherence between video frames. Motion estimation and motion compensation is the common technique used to strengthen the temporal correlation between frames. Most motion estimation methods are based on optical flow. The optical flow method has three basic assumptions: the movement scale is small; the luminance channel is constant; and every pixel in the same image has the same moving trend. In some scenes with complex motion, the accuracy of the underlying optical flow estimator is limited, which leads to artifacts in the video reconstruction. In recent years, generative adversarial network (GAN) has been widely used for VSR reconstruction, which can acquire more realistic texture details for single frame reconstruction. Based on the above reasons, we explore a GAN-based VSR method by optical flow-free motion estimation and compensation [optical flow-free generative and adversarial network (COFGAN)], which completes motion estimation by producing temporal dimension. COFGAN develops better motion estimation result and improves the performance of VSR without optical flow. To verify the motion estimation effect in complex scenes, long-term sequence real dynamic scene dataset realistic and dynamic scenes is applied for training and testing. We compare the performance of proposed COFGAN with earlier works such as video enhancement with task-oriented flow (TOFlow), frame-recurrent video super-resolution (FRVSR), learning temporal coherence via self-supervision for GAN-based video generation (TecoGAN), and so on. Our method achieves significant performance in the temporal coherence metrics performance of learning perceptual image patch similarity (tLP) (0.47) and performance of optical flow estimation (tOF) (7.07) with ×4 up-scaling factor. Compared to the best performance method TecoGAN in previous work, the proposed method promotes 29% of tLP and 26% of tOF. Moreover, COFGAN reaches the best accuracy on the commonly used video sequence datasets Vid4 and ToS3.

show abstract

“…The recurrent framework is popular for many video processing tasks including super-resolution [7,8,9,10,11,12,13]. The recurrent framework could either be unidirectional [8], bidirectional [13] , or omnidirectional [14].…”

Section: Recurrence Structurementioning

confidence: 99%

OldVSR: A model for the video super-resolution and restoration of old real-world TV series

Nokap¹,

Tayoung²

2022

View full text Add to dashboard Cite

With the recent advance in video super-resolution (VSR) techniques, there have been many requests for super-resolve realworld old analog TV series into high-definition digital content. As excellent classical TV series may receive little to no attention due to their poor video quality, restoring them would open new business opportunities for reusing old TV contents. A problem with restoring real-world old TV series is in the complex artifacts introduced by the old interlaced scanning and compression artifacts during the digitization of old analog videos. Though recent DNN-based VSR models perform nicely on clean videos, due to the artificial nature of interlacing and compression artifacts, they fail to restore old videos into a high-definition counterpart free from noticeable artifacts. In this work, we propose OldVSR for restoring old real-world TV series with artifacts of artificial nature. The proposed model implements a bidirectional recurrent structure with first and second-order propagation where each recurrent layer implements two main functions, i.e., Feature alignment (FA) and Pyramid feature aggregation (PFA). The outputs of the forward and backward layers are merged and upsampled to produce a High-Definition (HD) frame of the input standarddefinition (SD) frame. We demonstrate through experiments that our proposed OldVSR can effectively remove artifacts of artificial nature from old videos and successfully restores old TV series.

show abstract

Building an End-to-End Spatial-Temporal Convolutional Network for Video Super-Resolution

Cited by 30 publications

References 33 publications

DB-RNN: An RNN for Precipitation Nowcasting Deblurring

DB-RNN: An RNN for Precipitation Nowcasting Deblurring

Optical flow-free generative and adversarial network: generative and adversarial network-based video super-resolution method by optical flow-free motion estimation and compensation

OldVSR: A model for the video super-resolution and restoration of old real-world TV series

Contact Info

Product

Resources

About