Video compression artifact reduction aims to recover high-quality videos from low-quality compressed videos. Most existing approaches use a single neighboring frame or a pair of neighboring frames (preceding and/or following the target frame) for this task. Furthermore, as frames of high quality overall may contain low-quality patches, and high-quality patches may exist in frames of low quality overall, current methods focusing on nearby peak-quality frames (PQFs) may miss high-quality details in low-quality frames. To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames. An approximate non-local strategy is introduced in NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal dependency in a video sequence. This approximate strategy makes the non-local module work in a fast and low space-cost way. Our method uses the preceding and following frames of the target frame to generate a residual, from which a higher quality frame is reconstructed. Experiments on two datasets show that NL-ConvLSTM outperforms the existing methods.
Many deep learning based video compression artifact removal algorithms have been proposed to recover highquality videos from low-quality compressed videos. Recently, methods were proposed to mine spatiotemporal information via utilizing multiple neighboring frames as reference frames. However, these post-processing methods take advantage of adjacent frames directly, but neglect the information of the video itself, which can be exploited. In this paper, we propose an effective reference frame proposal strategy to boost the performance of the existing multiframe approaches. Besides, we introduce a loss based on fast Fourier transformation (FFT) to further improve the effectiveness of restoration. Experimental results show that our method achieves better fidelity and perceptual performance on MFQE 2.0 dataset than the state-of-the-art methods. And our method won Track 1 and Track 2, and was ranked the 2nd in Track 3 of NTIRE 2021 Quality enhancement of heavily compressed videos Challenge.
With the rapid increase of mobile devices and online media, more and more people prefer posting/viewing videos online. Generally, these videos are presented on video streaming sites with image thumbnails and text titles. While facing huge amounts of videos, a viewer clicks through a certain video with high probability because of its eye-catching thumbnail. However, current video thumbnails are created manually, which is time-consuming and quality-unguaranteed. And static image thumbnails contain very limited information of the corresponding videos, which prevents users from successfully clicking what they really want to view. In this paper, we address a novel problem, namely GIF thumbnail generation, which aims to automatically generate GIF thumbnails for videos and consequently boost their Click-Through-Rate (CTR). Here, a GIF thumbnail is an animated GIF file consisting of multiple segments from the video, containing more information of the target video than a static image thumbnail. To support this study, we build the first GIF thumbnails benchmark dataset that consists of 1070 videos covering a total duration of 69.1 hours, and 5394 corresponding manually-annotated GIFs. To solve this problem, we propose a learning-based automatic GIF thumbnail generation model, which is called Generative Variational Dual-Encoder (GEVADEN). As not relying on any user interaction information (e.g. time-sync comments and real-time view counts), this model is applicable to newly-uploaded/rarely-viewed videos. Experiments on our built dataset show that GEVADEN significantly outperforms several baselines, including video-summarization and highlight-detection based ones. Furthermore, we develop a pilot application of the proposed model on an online video platform with 9814 videos covering 1231 hours, which shows that our model achieves a 37.5% CTR improvement over traditional image thumbnails. This further validates the effectiveness of the proposed model and the promising application prospect of GIF thumbnails.
Video Compression Artifact Reduction aims to reduce the artifacts caused by video compression algorithms and improve the quality of compressed video frames. The critical challenge in this task is to make use of the redundant high-quality information in compressed frames for compensation as much as possible. Two important possible compensations: Motion compensation and global context, are not comprehensively considered in previous works, leading to inferior results. The key idea of this paper is to fuse the motion compensation and global context together to gain more compensation information to improve the quality of compressed videos. Here, we propose a novel Spatio-Temporal Compensation Fusion (STCF) framework with the Parallel Swin-CNN Fusion (PSCF) block, which can simultaneously learn and merge the motion compensation and global context to reduce the video compression artifacts. Specifically, a temporal self-attention strategy based on shifted windows is developed to capture the global context in an efficient way, for which we use the Swin transformer layer in the PSCF block. Moreover, an additional Ada-CNN layer is applied in the PSCF block to extract the motion compensation. Experimental results demonstrate that our proposed STCF framework outperforms the state-of-the-art methods up to 0.23dB (27% improvement) on the MFQEv2 dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.