Due to its remarkable energy compaction properties, the discrete cosine transform (DCT) is employed in a multitude of compression standards, such as JPEG and H.265/HEVC. Several low-complexity integer approximations for the DCT have been proposed for both 1-D and 2-D signal analysis. The increasing demand for low-complexity, energy efficient methods require algorithms with even lower computational costs. In this paper, new 8-point DCT approximations with very low arithmetic complexity are presented. The new transforms are proposed based on pruning state-of-the-art DCT approximations. The proposed algorithms were assessed in terms of arithmetic complexity, energy retention capability, and image compression performance. In addition, a metric combining performance and computational complexity measures was proposed. Results showed good performance and extremely low computational complexity. Introduced algorithms were mapped into systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology. All hardware-related metrics showed low resource consumption of the proposed pruned approximate transforms. The best proposed transform according to the introduced metric presents a reduction in power consumption of 21-25%.
KeywordsDCT approximation image compressionFPGA pruned transforms 1 IntroductionTransform-based methods are widely employed in digital signal processing applications [1]. In this context, the efficient computation of discrete transforms has constantly attracted community efforts and the proposition of fast algorithms [2]. In particular, the 8-point discrete cosine transform (DCT) has a proven record of scientific and industrial applications, as demonstrated by the multitude of image and video coding standards that adopt it, such as: JPEG [3], MPEG [4-6], H.261 [7,8], H.263 [5,9], H.264/AVC [10,11], and the recent high efficiency video coding (HEVC) [12,13]. The HEVC is capable of achieving high compression * Renato J. Cintra is with the Signal Processing ).performance at approximately half the bit rate required by its predecessor H.264/AVC with same image quality [13][14][15][16]. On the other hand, the HEVC requires a significantly higher computational complexity in terms of arithmetic operations [14-17], being 2-4 times more computationally costly than H.264/AVC [14,16].In this context, the efficient computation of the DCT is a venue for improving the performance of abovementioned codecs.Since its inception, several fast algorithms for the DCT have been proposed [18][19][20][21][22][23]. However, traditional algorithms aim at the computation of the exact DCT, which requires several multiplication operations. Additionally, several algorithms have achieved theoretical multiplicative complexity lower-bounds [21,24]. As a consequence, the progress in this area headed to approximate methods [25][26][27]. In some applications, a simple DCT approximation can provide meaningful results at low arithmetic complexity [28]. Thus, appro...