“…A low-cost IDCT employing a single 1-D core was presented in [13] for the execution of 2-D transforms. The architecture implemented 32 coefficients per cycle to achieve high throughput of 1.6 gigapixels/s [21]. However, the design only supports 32-point HEVC inverse transform that is not suitable for HEVC application.…”
Section: Comparison and Discussionmentioning
confidence: 99%
“…The high throughput rate in [19][20][21] was due to high operation frequency for [19,20] and high processing rate [21]. The results in [21] is only 1-D architecture; thus, high hardware efficiency can be achieved. Based on this criterion, the proposed design has the highest hardware efficiency, as shown in Table III. 6.…”
Section: Comparison and Discussionmentioning
confidence: 99%
“…This architecture implemented on TSMC 65 nm runs at a clock frequency of 500 MHz and achieves throughput of 1990 megapixel/s. The architecture implemented 32 coefficients per cycle to achieve high throughput of 1.6 gigapixels/s [21]. To overcome the difficulties involved in comparing different designs with regard to speed and area overhead, we defined hardware efficiency as throughput per gate.…”
Section: Comparison and Discussionmentioning
confidence: 99%
“…The design in [11] utilizes about two computation paths to reach high throughput rate; however, large area cost resulting in the hardware efficiency is only 2.21. The high throughput rate in [19][20][21] was due to high operation frequency for [19,20] and high processing rate [21]. The results in [21] is only 1-D architecture; thus, high hardware efficiency can be achieved.…”
This paper presents a hardware design capable of supporting high-efficiency video coding inverse discrete cosine transform (IDCT) with a 32 × 32 transform unit size, using a single 1-D IDCT core with transpose memory to reduce costs. The proposed 1-D IDCT core employs 16 computation paths for high throughput and is implemented using distributed arithmetic to facilitate the sharing of hardware resources. The proposed 1-D IDCT is capable of calculating 1-D and 2-D data simultaneously along 32 parallel paths. When implemented using Taiwan Semiconductor Manufacturing Company (TSMC) 40-nm CMOS technology, the proposed 2-D transform core provides throughput of 6.4 gigapixels/s with a gate count of 335 k. The results show that a superior hardware efficiency can be achieved in the proposed 32-point IDCT core compared with the existing works.
“…A low-cost IDCT employing a single 1-D core was presented in [13] for the execution of 2-D transforms. The architecture implemented 32 coefficients per cycle to achieve high throughput of 1.6 gigapixels/s [21]. However, the design only supports 32-point HEVC inverse transform that is not suitable for HEVC application.…”
Section: Comparison and Discussionmentioning
confidence: 99%
“…The high throughput rate in [19][20][21] was due to high operation frequency for [19,20] and high processing rate [21]. The results in [21] is only 1-D architecture; thus, high hardware efficiency can be achieved. Based on this criterion, the proposed design has the highest hardware efficiency, as shown in Table III. 6.…”
Section: Comparison and Discussionmentioning
confidence: 99%
“…This architecture implemented on TSMC 65 nm runs at a clock frequency of 500 MHz and achieves throughput of 1990 megapixel/s. The architecture implemented 32 coefficients per cycle to achieve high throughput of 1.6 gigapixels/s [21]. To overcome the difficulties involved in comparing different designs with regard to speed and area overhead, we defined hardware efficiency as throughput per gate.…”
Section: Comparison and Discussionmentioning
confidence: 99%
“…The design in [11] utilizes about two computation paths to reach high throughput rate; however, large area cost resulting in the hardware efficiency is only 2.21. The high throughput rate in [19][20][21] was due to high operation frequency for [19,20] and high processing rate [21]. The results in [21] is only 1-D architecture; thus, high hardware efficiency can be achieved.…”
This paper presents a hardware design capable of supporting high-efficiency video coding inverse discrete cosine transform (IDCT) with a 32 × 32 transform unit size, using a single 1-D IDCT core with transpose memory to reduce costs. The proposed 1-D IDCT core employs 16 computation paths for high throughput and is implemented using distributed arithmetic to facilitate the sharing of hardware resources. The proposed 1-D IDCT is capable of calculating 1-D and 2-D data simultaneously along 32 parallel paths. When implemented using Taiwan Semiconductor Manufacturing Company (TSMC) 40-nm CMOS technology, the proposed 2-D transform core provides throughput of 6.4 gigapixels/s with a gate count of 335 k. The results show that a superior hardware efficiency can be achieved in the proposed 32-point IDCT core compared with the existing works.
“…As adopted by several image encoding schemes [3], [27]- [29], the 2D DCT computation is performed by successive calls of the 1D DCT applied to the columns of the input 2D data, then to the rows of the resulting matrix. For blocks of size 8 × 8, sixteen calls of the 1D DCTs are required to furnish the 2D DCT.…”
This paper proposes a computational method for 2D 8×8 DCT based on algebraic integers. The proposed algorithm is based on the Loeffler 1D DCT algorithm, and is shown to operate with exact computation-i.e., error-free arithmetic-up to the final reconstruction step (FRS). The proposed algebraic integer architecture maintains error-free computations until an entire block of DCT coefficients having size 8×8 is computed, unlike algorithms in the literature which claim to be error-free but in fact introduce arithmetic errors between the column-and row-wise 1D DCT stages in a 2D DCT operation. Fast algorithms are proposed for the final reconstruction step employing two approaches, namely, the expansion factor and dyadic approximation. A digital architecture is also proposed for a particular FRS algorithm, and is implemented on an FPGA platform for on-chip verification. The FPGA implementation operates at 360 MHz, and is capable of a real-time throughput of 3.6 · 10 8 2D DCTs of size 8×8 every second, with corresponding pixel rate of 2.3 · 10 10 pixels per second. The digital architecture is synthesized using 180 nm CMOS standard cells and shows a chip area of 7.41 mm 2 . The CMOS design is predicted to operate at 893 MHz clock frequency, at a dynamic power consumption 13.22 mW/MHz · V 2 sup .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.