Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30% to 70% in average for only 0.7% to 3.0% Bjøntegaard Delta Rate (BD-BR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.
The Joint Video Expert Team (JVET) is developing the next-generation video coding standard called Versatile Video Coding (VVC) and their ultimate goal is to double the coding efficiency over the current state-of-the-art standard HEVC without letting complexity get out of hand. This work addresses the complexity of the VVC reference encoder called VVC Test Model (VTM) under All Intra coding configuration. The VTM3.0 is able to improve intra coding efficiency by 21% over the latest HEVC reference encoder HM16.19. This coding gain primarily stems from three new coding tools. First, the HEVC Quad-Tree (QT) structure extension with Multi-Type Tree (MTT) partitioning. Second, the duplication of intra prediction modes from 35 to 67. And third, the Multiple Transform Selection (MTS) scheme with two new discrete cosine/sine transforms (DCT-VIII and DST-VII). However, these new tools also play an integral part in making VTM intra encoding around 20 times as complex as that of HM. The purpose of this work is to analyze these tools individually and specify theoretical upper limits for their complexity reduction. According to our evaluations, the complexity reduction opportunity of block partitioning is up to 97%, i.e., the encoding complexity would drop down to 3% for the same coding efficiency if the optimal block partitioning could be directly predicted. The respective percentages for intra mode reduction and MTS optimization are 65% and 55%. We believe these results motivate VVC codec designers to develop techniques that are able to take most out of these opportunities.
VVC is the next generation video coding standard, offering coding capability beyond HEVC standard. The high computational complexity of the latest video coding standards requires high-level parallelism techniques, in order to achieve real-time and low latency encoding and decoding. HEVC and VVC include tile grid partitioning that allows to process simultaneously rectangular regions of a frame with independent threads. The tile grid may be further partitioned into a horizontal sub-grid of Rectangular Slices (RSs), increasing the partitioning flexibility. The dynamic Tile and Rectangular Slice (TRS) partitioning solution proposed in this paper benefits from this flexibility. The TRS partitioning is carried-out at the frame level, taking into account both spatial texture of the content and encoding times of previously encoded frames. The proposed solution searches the best partitioning configuration that minimizes the trade-off between multi-thread encoding time and encoding quality loss. Experiments prove that the proposed solution, compared to uniform TRS partitioning, significantly decreases multi-thread encoding time, with slightly better encoding quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.