Detail-Revealing Deep Video Super-Resolution

Xin, Tao; Gao, Hongyun; Liao, Renjie; Wang, Jue; Jia, Jiaya

doi:10.1109/iccv.2017.479

Cited by 480 publications

(487 citation statements)

References 26 publications

Supporting

Mentioning

486

Contrasting

Order By: Relevance

“…In addition to compression artifact removal, spatiotemporal correlation mining is also a hot topic in other video quality enhancement tasks, such as video super resolution (VSR). [4,18,19,23,32,37,42] estimated optical flow and warped several frames to capture the hidden spa- tiotemporal dependency for VSR. Although these methods work well, they rely heavily on the accuracy of motion estimation.…”

Section: Video Compression Artifact Reductionmentioning

confidence: 99%

See 1 more Smart Citation

Non-Local ConvLSTM for Video Compression Artifact Reduction

Gao²,

Tian

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Video compression artifact reduction aims to recover high-quality videos from low-quality compressed videos. Most existing approaches use a single neighboring frame or a pair of neighboring frames (preceding and/or following the target frame) for this task. Furthermore, as frames of high quality overall may contain low-quality patches, and high-quality patches may exist in frames of low quality overall, current methods focusing on nearby peak-quality frames (PQFs) may miss high-quality details in low-quality frames. To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames. An approximate non-local strategy is introduced in NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal dependency in a video sequence. This approximate strategy makes the non-local module work in a fast and low space-cost way. Our method uses the preceding and following frames of the target frame to generate a residual, from which a higher quality frame is reconstructed. Experiments on two datasets show that NL-ConvLSTM outperforms the existing methods.

show abstract

Section: Video Compression Artifact Reductionmentioning

confidence: 99%

“…Different from ConvLSTM in [37,41] that is fed with only feature F t at time t, NL-ConvLSTM takes additional feature F t−1 at time (t-1) as input, and outputs the corresponding hidden state and cell state H t , C t ∈ R C h ×N . Here, C h is the number of channels of hidden state and cell state.…”

Section: The Frameworkmentioning

confidence: 99%

Non-Local ConvLSTM for Video Compression Artifact Reduction

Gao²,

Tian

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…They use a multi-scale spatial transformer to warp the LR frame and eventually generate an HR frame through another deep network. Tao et al [20] proposed a sub-pixel motion compensation layer for frame alignment and used a convolution LSTM architecture in following SR reconstruction network.…”

Section: Video Super-resolutionmentioning

confidence: 99%

“…Due to the motion of the camera or object, the neighboring frames should be spatially aligned first so as to utilize the information and extract missing details from them. To this end, the traditional VSR methods [16,20,18,1] usually calculate the optical flow and estimate the sub-pixel motion between LR frames to warp the neighboring frames and achieve the alignment operation. However, fast and reliable flow estimation still remains a challenging problem.…”

Section: Introductionmentioning

confidence: 99%

Deformable Non-Local Network for Video Super-Resolution

Wang

Liu

et al. 2019

IEEE Access

View full text Add to dashboard Cite

The video super-resolution (VSR) task aims to restore a high-resolution video frame by using its corresponding low-resolution frame and multiple neighboring frames. At present, many deep learning-based VSR methods rely on optical flow to perform frame alignment. The final recovery results will be greatly affected by the accuracy of optical flow. However, optical flow estimation cannot be completely accurate, and there are always some errors. In this paper, we propose a novel deformable non-local network (DNLN) which is non-flow-based. Specifically, we apply the improved deformable convolution in our alignment module to achieve adaptive frame alignment at the feature level. Furthermore, we utilize a non-local module to capture the global correlation between the reference frame and aligned neighboring frame, and simultaneously enhance desired fine details in the aligned frame. To reconstruct the final high-quality HR video frames, we use residual in residual dense blocks to take full advantage of the hierarchical features. Experimental results on several datasets demonstrate that the proposed DNLN can achieve state of the art performance on video super-resolution task.

show abstract

“…Similar to (Kappeler et al 2016b), (Caballero et al 2017) uses a trainable motion compensation network to replace the optical flow method in (Kappeler et al 2016b). Following this fashion, Tao et al (Tao et al 2017) propose a network comprising motion estimation, motion compensation, and detail fusion to process a batch of LR frames and output HR estimate. Different from the above mentioned approaches, (Sajjadi, Vemulapalli, and Brown 2018) proposes a frame recurrent video super-resolution (FRVSR) framework that combines the previous HR estimates to generate subsequent frame.…”

Section: Related Workmentioning

confidence: 99%

Frame and Feature-Context Video Super-Resolution

Yan

Lin

2019

AAAI

View full text Add to dashboard Cite

For video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR frames to super-resolve the following frame. The main weaknesses of these approaches are: 1) separately generating each output frame may obtain high-quality HR estimates while resulting in unsatisfactory flickering artifacts, and 2) combining previously generated HR frames can produce temporally consistent results in the case of short information flow, but it will cause significant jitter and jagged artifacts because the previous super-resolving errors are constantly accumulated to the subsequent frames. In this paper, we propose a fully end-to-end trainable frame and feature-context video super-resolution (FFCVSR) network that consists of two key sub-networks: local network and context network, where the first one explicitly utilizes a sequence of consecutive LR frames to generate local feature and local SR frame, and the other combines the outputs of local network and the previously estimated HR frames and features to super-resolve the subsequent frame. Our approach takes full advantage of the inter-frame information from multiple LR frames and the context information from previously predicted HR frames, producing temporally consistent highquality results while maintaining real-time speed by directly reusing previous features and frames. Extensive evaluations and comparisons demonstrate that our approach produces state-of-the-art results on a standard benchmark dataset, with advantages in terms of accuracy, efficiency, and visual quality over the existing approaches.

show abstract

Detail-Revealing Deep Video Super-Resolution

Cited by 480 publications

References 26 publications

Non-Local ConvLSTM for Video Compression Artifact Reduction

Non-Local ConvLSTM for Video Compression Artifact Reduction

Deformable Non-Local Network for Video Super-Resolution

Frame and Feature-Context Video Super-Resolution

Contact Info

Product

Resources

About