ELF-VC: Efficient Learned Flexible-Rate Video Coding

Rippel, Oren; Anderson, Alexander G.; Tatwawadi, Kedar; Nair, Sanjay; Lytle, Craig; Bourdev, Lubomir

doi:10.48550/arxiv.2104.14335

Cited by 3 publications

(10 citation statements)

References 29 publications

(67 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[24,49], which use 3D convolution architectures, and Refs. [3,9,11,12,16,23,26,33,36,37,48,50,52,53,69], which model P-frames as an optical flow field applied to the previous frame plus a residual model.…”

Section: Related Workmentioning

confidence: 99%

Implicit Neural Video Compression

Zhang¹,

Rozendaal²,

Brehmer³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a method to compress full-resolution video sequences with implicit neural representations. Each frame is represented as a neural network that maps coordinate positions to pixel values. We use a separate implicit network to modulate the coordinate inputs, which enables efficient motion compensation between frames. Together with a small residual network, this allows us to efficiently compress Pframes relative to the previous frame. We further lower the bitrate by storing the network weights with learned integer quantization. Our method, which we call implicit pixel flow (IPF), offers several simplifications over established neural video codecs: it does not require the receiver to have access to a pretrained neural network, does not use expensive interpolation-based warping operations, and does not require a separate training dataset. We demonstrate the feasibility of neural implicit compression on image and video data.

show abstract

Section: Related Workmentioning

confidence: 99%

Implicit Neural Video Compression

Zhang¹,

Rozendaal²,

Brehmer³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Neural video compression approaches are on the way catching up with traditional standards. Existing works in this field can be classified into two categories, designed for low delay setting [2,10,21,23,28,31,32,35,42] and random access setting [15,41,48,50]. The low delay setting is suitable for applications such as live streaming, which only uses the past frame(s) to predict the current frame.…”

Section: Video Compressionmentioning

confidence: 99%

“…The aforementioned DVC [35], SSF [2] and FVC [23] are typical works improving single-reference prediction. Some multi-frame fusion modules are also designed for unidirectional prediction with multiple reference frames [23,28,42]. Obviously, fusing more reference frames benefits the RD performance, but also brings a significant increase of memory cost.…”

Section: Video Compressionmentioning

confidence: 99%

“…The modulated loss function assigns larger λ value for the later P frames. The goal of modulated loss is to balance the reconstruction quality of frames in one GoP unit, the concept of which has been implemented in [37,42]. With its help, the later P frame will be reconstructed with better quality, mitigating temporal error propagation.…”

Section: Modulated Lossmentioning

confidence: 99%

“…Many neural video codecs have been designed in the past few years. Some are proposed for the low delay setting, reducing the temporal redundancies by unidirectional prediction [2,10,21,23,28,31,32,35,42]. Some can be used for the random access setting, making use of bidirectional references to predict the target frame [15,41,48,50].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression

Guo¹,

Feng²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we present the first neural video codec that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode. Existing neural hybrid video coding approaches rely on optical flow or Gaussian-scale flow for prediction, which cannot support fine-grained adaptation to diverse motion content. Towards more content-adaptive prediction, we propose a novel cross-scale prediction module that achieves more effective motion compensation. Specifically, on the one hand, we produce a reference feature pyramid as prediction sources, then transmit cross-scale flows that leverage the feature scale to control the precision of prediction. On the other hand, we introduce the mechanism of weighted prediction into the scenario of prediction with a single reference frame, where cross-scale weight maps are transmitted to synthesize a fine prediction result. In addition to the cross-scale prediction module, we further propose a multi-stage quantization strategy, which improves the rate-distortion performance with no extra computational penalty during inference. We show the encouraging performance of our efficient neural video codec (ENVC) on several common benchmark datasets and analyze in detail the effectiveness of every important component.

show abstract

Region-of-Interest Based Neural Video Compression

Perugachi-Diaz¹,

Sautiere²,

Abati³

et al. 2022

Preprint

View full text Add to dashboard Cite

Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low rate constraints. Recently, several neural codecs have been introduced for video compression, yet they operate uniformly over all spatial locations, lacking the capability of ROIbased processing. In this paper, we introduce two models for ROI-based neural video coding. First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background. Secondly, we design an explicit latent scaling method, that allows control over the quantization binwidth for different spatial regions of latent variables, conditioned on the ROI mask. By extensive experiments, we show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover, they can generalize to different datasets and to any arbitrary ROI at inference time. Finally, they do not require expensive pixellevel annotations during training, as synthetic ROI masks can be used with little to no degradation in performance. To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.

show abstract

ELF-VC: Efficient Learned Flexible-Rate Video Coding

Cited by 3 publications

References 29 publications

Implicit Neural Video Compression

Implicit Neural Video Compression

Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression

Region-of-Interest Based Neural Video Compression

Contact Info

Product

Resources

About