This paper presents the Motion Vectors Merging (MVM) heuristic, which is a method to reduce the HEVC inter-prediction complexity targeting the PU partition size decision. In the HM test model of the emerging HEVC standard, computational complexity is mostly concentrated in the inter-frame prediction step (up to 96% of the total encoder execution time, considering common test conditions). The goal of this work is to avoid several Motion Estimation (ME) calls during the PU inter-prediction decision in order to reduce the execution time in the overall encoding process. The MVM algorithm is based on merging NxN PU partitions in order to compose larger ones. After the best PU partition is decided, ME is called to produce the best possible rate-distortion results for the selected partitions. The proposed method was implemented in the HM test model version 3.4 and provides an execution time reduction of up to 34% with insignificant ratedistortion losses (0.08 dB drop and 1.9% bitrate increase in the worst case). Besides, there is no related work in the literature that proposes PU-level decision optimizations. When compared with works that target CU-level fast decision methods, the MVM shows itself competitive, achieving results as good as those works.
Motion Estimation (ME) in video coding is a vital component that excels not only in computational complexity, but off-chip memory bandwidth as well. These two issues are considered critical constraints in terms of High Definition (HD) video coding, since a large volume of data must be processed. The multilevel data reuse scheme proposed in this paper is able to reduce the off-chip memory bandwidth, with direct impact in throughput and energy consumption. This scheme explores the concept of overlapped Search Windows (SW) in more than one level and poses no harm to video quality. Comparisons with related works show that this solution provides the best tradeoff between the use of on-chip memory and reduction of the off-chip memory bandwidth. The data reuse scheme was applied in a ME architecture and the synthesis results show that this solution presented the lowest use of hardware resources and the highest operation frequency among related works. The proposed architecture is able to process 1080p videos at 25 fps, and the reduction ratio of off-chip memory access achieved by the architecture is greater than 95% when compared to the traditional method.
This paper presents a memory assessment of the next-generation Versatile Video Coding (VVC). The memory analyses are performed adopting as a baseline the state-of-the-art High-Efficiency Video Coding (HEVC). The goal is to offer insights and observations of how critical the memory requirements of VVC are aggravated, compared to HEVC. The adopted methodology consists of two sets of experiments: (1) an overall memory profiling and (2) an inter-prediction specific memory analysis. The results obtained in the memory profiling show that VVC access up to 13.4x more memory than HEVC. Moreover, the inter-prediction module remains (as in HEVC) the most resource-intensive operation in the encoder: 60%-90% of the memory requirements. The inter-prediction specific analysis demonstrates that VVC requires up to 5.3x more memory accesses than HEVC. Furthermore, our analysis indicates that up to 23% of such growth is due to VVC novel-CU sizes (larger than 64x64).
In this work we present a high throughput hardware architecture for the H.264/AVC intra-frame encoder exploiting the parallelism of intra prediction, forward and inverse transforms and quantization. Since there is a strong data dependency between the intra prediction and the image reconstruction loop, the latency of this path is a key design issue in order to provide high performance coding. Considering that 77% of the total intra-encoding computation is spent in these modules, our architecture handles a 4-pixel wide intra prediction module and a 16-pixel wide reconstruction loop. Compared to the state-of-the-art our approach reduces by 47% the number of cycles to process a macroblock. Running at 150 MHz our architecture guarantees encoding of 61 HD1080p frames per second. The developed architecture requires 73.4 MHz to real-time encode HD1080p, which is a 46% reduction of the frequency requirement compared to the state-of-the-art.
This paper presents the Spread and Iterative Search (S&IS) motion estimation algorithm, which uses a random spread evaluation together with a central iterative evaluation to avoid local minima falls and to increase the image quality for high definition videos. Considering Full HD videos, S&IS reached an average PSNR gain of 1.41dB when compared to Diamond Search (DS), with an increase of about four times in the number of evaluated blocks. When compared to Full Search (FS), the S&IS achieved an average PSNR loss of 1.56 dB, evaluating 73 times less blocks than FS. An efficient architecture for the S&IS algorithm is also presented in this paper.The architecture was designed targeting in real time processing (30 frames per seconds) for QFHD videos (3840×2160 pixels). The architecture was described in VHDL and synthesized for and Altera Stratix 4 FPGA and for ST90nm standard cells technology. Booth syntheses show that the architecture is able to process QFHD frames in real time. The standard cells version is able to reach also a good trade-off among area, memory and power consumption, processing QFHD videos with 62.2 mW.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.