This paper proposes a flexible and efficient implementation of the two-dimensional N-point Discrete Cosine Transform (DCT) for the High Efficiency Video Coding (HEVC) standard. The DCT is implemented through the Walsh-Hadamard Transform (WHT) followed by Givens rotations. This scheme is exploited to derive an adaptive algorithm, which allows to compute four different approximations ranging from the complete DCT to the WHT, by selectively skipping some rotations. The work shows the statistical analysis of the DCT usage and derives a pre-computation mechanism to adaptively skip rotations. Each approximation, referred to as operating mode, is characterized by a large saving of operations, at the expense of very small quality loss. Then, two 2D-DCT architectures are proposed: the first one is totally unfolded while the second one is folded. The two designs are finally synthesized with a 90-nm standard-cell library for a clock frequency of 250 MHz. Both architectures support real-time processing of 8K UHD video sequences at 64 and 26 fps respectively and show higher throughput and lower gate count compared to state-of-art implementations. Moreover, power saving ranging from 28% to 56% can be achieved by working within the proposed operating modes.
When designing hardware-accelerated video encoding systems, it is fundamental to determine the maximum throughput needed by each subsystem so that the design can optimize the cost-performance tradeoff. One of the key modules in video coding is the 2D transform operation which is typically subject to heavy optimization efforts. This work investigates the tradeoff between the computational power spent in performing the transform operations for HEVC compression and the corresponding video quality as a function of a number of coding configuration parameters. Results provides a practical method to determine the throughput needed by the transform coding subsystem as well as the optimal configuration of the considered coding parameters for each desired complexity-quality tradeoff, showing that with small quality reduction large computational power savings are possible.
This work describes an approximate DCT architecture for the High Efficiency Video Coding (HEVC) standard. Since the standard requires to support multiple block sizes, architectures based on exact implementation require a relevant amount of hardware resources, namely multipliers and adders. This work aims to reduce the amount of hardware resources while keeping the rate-distortion performance nearly optimal. To achieve this goal, this work exploits an exact factorization of the DCT of size N = 8, which is then extended to obtain approximate DCTs of size N = 16 and N = 32. Simulation and implementation results prove that the proposed approximate solution features a complexity reduction with respect to exact one of more than 43% with an average rate-distortion performance loss of 4.74% for the worst-case (all-intra) configuration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.