A Dual-Critic Reinforcement Learning Framework for Frame-Level Bit Allocation in HEVC/H.265

Ho, Yung-Han; Jin, Guo-Lun; Liang, Ying; Peng, Wen-Hsiao; Li, Xiaobo

doi:10.1109/dcc50243.2021.00009

Cited by 12 publications

(10 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gao et al [2016] utilize a game theory method to allocate CTU-level bit allocation and optimize for SSIM in HEVC. More recently, Reinforcement learning approaches in rate control have also been proposed to HEVC , Ho et al, 2021, Chen et al, 2018. Mao et al [2020] use an imitation learning approach on evolutionary search based policy [Salimans et al, 2017] with a feedback-based correction for rate control in VP9.…”

Section: Related Workmentioning

confidence: 99%

MuZero with Self-competition for Rate Control in VP9 Video Compression

Mandhane¹,

Zhernov²,

Rauh³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce energy use and costs overall. In this paper, we present an application of the MuZero algorithm to the challenge of video compression. Specifically, we target the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library widely used by popular video-on-demand (VOD) services. We treat this as a sequential decision making problem to maximize the video quality with an episodic constraint imposed by the target bitrate. Notably, we introduce a novel self-competition based reward mechanism to solve constrained RL with variable constraint satisfaction difficulty, which is challenging for existing constrained RL methods. We demonstrate that the MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level (measured as PSNR BD-rate) compared to libvpx's two-pass VBR rate control policy, while having better constraint satisfaction behavior.

show abstract

Section: Related Workmentioning

confidence: 99%

MuZero with Self-competition for Rate Control in VP9 Video Compression

Mandhane¹,

Zhernov²,

Rauh³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Different from the single-critic approaches [1,2,3,4], Ho et al [5] learn two separate critics, one for estimating the distortion r D reward and the other for the rate r R reward. They introduce a dual-critic learning algorithm that trains the RL agent by alternating the rate critic with the distortion critic according to how the RL agent behaves in encoding a GOP.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we propose an action-constrained RL framework though Neural Frank-Wolfe Policy Optimization (NFWPO). Similar to the dual-critic idea [5], our scheme includes a rate critic and a distortion critic. However, unlike [5], the rate critic is utilized to specify a state-dependent feasible set, i.e.…”

Section: Introductionmentioning

confidence: 99%

“…Similar to the dual-critic idea [5], our scheme includes a rate critic and a distortion critic. However, unlike [5], the rate critic is utilized to specify a state-dependent feasible set, i.e. an action space that meets the rate constraint.…”

Section: Introductionmentioning

confidence: 99%

“…Our main contributions are as follows: (1) this work presents a novel RL framework that incorporates the Frank-Wolfe policy optimization to address the frame-level bit allocation for HEVC/H.265; (2) it outperforms both the singlecritic [1] and the dual-critic [5] methods, showing comparable rate-distortion (R-D) results to the 2-pass ABR of x265. It is to be noted that our scheme performs bit allocation in one pass at test time.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Action-Constrained Reinforcement Learning for Frame-Level Bit Allocation in HEVC/H.265 through Frank-Wolfe Policy Optimization

Ho¹,

Liang²,

Kao³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents a reinforcement learning (RL) framework that leverages Frank-Wolfe policy optimization to address frame-level bit allocation for HEVC/H.265. Most previous RL-based approaches adopt the single-critic design, which weights the rewards for distortion minimization and rate regularization by an empirically chosen hyper-parameter. More recently, the dual-critic design is proposed to update the actor network by alternating the rate and distortion critics. However, the convergence of training is not guaranteed. To address this issue, we introduce Neural Frank-Wolfe Policy Optimization (NFWPO) in formulating the frame-level bit allocation as an action-constrained RL problem. In this new framework, the rate critic serves to specify a feasible action set, and the distortion critic updates the actor network towards maximizing the reconstruction quality while conforming to the action constraint. Experimental results show that when trained to optimize the video multi-method assessment fusion (VMAF) metric, our NFWPO-based model outperforms both the single-critic and the dual-critic methods. It also demonstrates comparable rate-distortion performance to the 2-pass average bit rate control of x265.

show abstract