In this paper, we propose a novel Convolutional Neural Network (CNN) based video coding technique using a video prediction network (VPN) to support enhanced motion prediction in High Efficiency Video Coding (HEVC). Specifically, we design a CNN VPN to generate a virtual reference frame (VRF), which is synthesized using previously coded frames, to improve coding efficiency. The proposed VPN uses two sub-VPN architectures in cascade to predict the current frame in the same time instance. The VRF is expected to have higher temporal correlation than a conventional reference frame, and, thus it is substituted for a conventional reference frame. The proposed technique is incorporated into the HEVC inter-coding framework. Particularly, the VRF is managed in a HEVC reference picture list, so that each prediction unit (PU) can choose a better prediction signal through Rate-Distortion optimization without any additional side information. Furthermore, we modify the HEVC inter-prediction mechanisms of Advanced Motion Vector Prediction and Merge modes adaptively when the current PU uses the VRF as a reference frame. In this manner, the proposed technique can exploit the PU-wise multi-hypothesis prediction techniques in HEVC. Since the proposed VPN can perform both the video interpolation and extrapolation, it can be used for Random Access (RA) and Low Delay B (LD) coding configurations. It is shown in experimental results that the proposed technique provides −2.9% and −5.7% coding gains, respectively, in RA and LD coding configurations as compared to the HEVC reference software, HM 16.6 version.
With the great flexibility and performance of deep learning technology, there have been many attempts to replace existing functions inside video codecs such as High-Efficiency Video Coding (HEVC) with deep-learning-based solutions. One of the most researched approaches is adopting a deep network as an image restoration filter to recover distorted compressed frames. In this paper, instead, we introduce a novel idea for using a deep network, in which it chooses and transmits the side information according to the type of errors and contents, inspired by the sample adaptive offset filter in HEVC. A part of the network computes the optimal offset values while another part estimates the type of error and contents simultaneously. The combination of two subnetworks can address the estimation of highly nonlinear and complicated errors compared to conventional deep-learning-based schemes. Experimental results show that the proposed system yields an average bit-rate saving of 4.2% and 2.8% for the low-delay P and random access modes, respectively, compared to the conventional HEVC. Moreover, the performance improvement is up to 6.3% and 3.9% for higher-resolution sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.