Feedback Recurrent Autoencoder for Video Compression

Goliński, Adam; Pourreza, Reza; Yang, Yang; Sautiere, Guillaume; Cohen, Taco

doi:10.48550/arxiv.2004.04342

Cited by 6 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Vimeo-90k [11] consists of 90,000 clips of 7 frames at 448x256 resolution collected from vimeo.com, which has been used in previous works [10], [35], [58] [59] only contains human action videos, and previous methods that trained on Kinetics [32], [47], [60] generally report worse rate-distortion performance on diverse benchmarks (such as UVG, to be discussed below), compared to [4] which reportedly is trained on a significantly larger dataset with high resolution collected from youtube.com.…”

Section: Training Datasetsmentioning

confidence: 99%

Insights from Generative Modeling for Neural Video Compression

Yang¹,

Yang²,

Marino³

et al. 2021

Preprint

View full text Add to dashboard Cite

While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present recent neural video codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on full-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field.

show abstract

Section: Training Datasetsmentioning

confidence: 99%

Insights from Generative Modeling for Neural Video Compression

Yang¹,

Yang²,

Marino³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Moreover, there has been a surge in leveraging Variational Autoencoders for data compression [50]. The experimental results presented in Section 5.2.1 are a strong indicator that the proposed architecture can realize smaller reconstruction loss with fewer latent dimensions (100 vs 128 latent variables).…”

Section: Broader Impactmentioning

confidence: 99%

Self-Reflective Variational Autoencoder

Apostolopoulou¹,

Rosenfeld²,

Dubrawski³

2020

Preprint

View full text Add to dashboard Cite

The Variational Autoencoder (VAE) is a powerful framework for learning probabilistic latent variable generative models. However, typical assumptions on the approximate posterior distribution of the encoder and/or the prior, seriously restrict its capacity for inference and generative modeling. Variational inference based on neural autoregressive models respects the conditional dependencies of the exact posterior, but this flexibility comes at a cost: such models are expensive to train in high-dimensional regimes and can be slow to produce samples. In this work, we introduce an orthogonal solution, which we call self-reflective inference. By redesigning the hierarchical structure of existing VAE architectures, self-reflection ensures that the stochastic flow preserves the factorization of the exact posterior, sequentially updating the latent codes in a recurrent manner consistent with the generative model. We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior-on binarized MNIST, self-reflective inference achieves state-of-the art performance without resorting to complex, computationally expensive components such as autoregressive layers. Moreover, we design a variational normalizing flow that employs the proposed architecture, yielding predictive benefits compared to its purely generative counterpart. Our proposed modification is quite general and complements the existing literature; self-reflective inference can naturally leverage advances in distribution estimation and generative modeling to improve the capacity of each layer in the hierarchy.Preprint. Under review.

show abstract

“…Another popular direction (which this work follows) is to design a low-latency ML-based codec, which only features keyframe compression and forward frame extrapolation (i.e I/P-frames only) [29,24,27]. Promising recent directions involve modeling motion using scale-space flow [2] and resolution-adaptive flow [25,16], propagating a latent state [34,13], and explicitly mitigating error propagation [28]. Yet another promising approach [14] revolves around using spatiotemporal autoencoders to encode chunks of frames.…”

Section: Ml-based Video Compressionmentioning

confidence: 99%

“…Figure 7: Rate-distortion curves of traditional codecs and state-of-the-art ML codecs[2,29,13,14,27,28,25,44,42] on the UVG and MCL-JCV video datasets.…”

mentioning

confidence: 99%

ELF-VC: Efficient Learned Flexible-Rate Video Coding

Rippel¹,

Anderson²,

Tatwawadi³

et al. 2021

Preprint

View full text Add to dashboard Cite

While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I-and Pframes only) along with a considerable increase in computational efficiency. In this setting, for natural videos our approach compares favorably across the entire R-D curve under metrics PSNR, MS-SSIM and VMAF against all mainstream video standards (H.264, H.265, AV1) and all ML codecs. At the same time, our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures.Our contributions include a flexible-rate framework allowing a single model to cover a large and dense range of bitrates, at a negligible increase in computation and parameter count; an efficient backbone optimized for MLbased codecs; and a novel in-loop flow prediction scheme which leverages prior information towards more efficient compression.We benchmark our method, which we call ELF-VC (Efficient, Learned and Flexible Video Coding) on popular video test sets UVG and MCL-JCV under metrics PSNR, MS-SSIM and VMAF. For example, on UVG under PSNR, it reduces the BD-rate by 44% against H.264, 26% against H.265, 15% against AV1, and 35% against the current best ML codec. At the same time, on an NVIDIA Titan V GPU our approach encodes/decodes VGA at 49/91 FPS, HD 720 at 19/35 FPS, and HD 1080 at 10/18 FPS.

show abstract

Feedback Recurrent Autoencoder for Video Compression

Cited by 6 publications

References 0 publications

Insights from Generative Modeling for Neural Video Compression

Insights from Generative Modeling for Neural Video Compression

Self-Reflective Variational Autoencoder

ELF-VC: Efficient Learned Flexible-Rate Video Coding

Contact Info

Product

Resources

About