An HEVC multi-size DCT hardware with constant throughput and supporting heterogeneous CUs

Goebel, Jones; Paim, Guilherme; Agostini, Luciano; Porto, Marcelo

doi:10.1109/iscas.2016.7539019

Cited by 20 publications

(10 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A low-cost IDCT employing a single 1-D core was presented in [13] for the execution of 2-D transforms. The architecture implemented 32 coefficients per cycle to achieve high throughput of 1.6 gigapixels/s [21]. However, the design only supports 32-point HEVC inverse transform that is not suitable for HEVC application.…”

Section: Comparison and Discussionmentioning

confidence: 99%

“…The high throughput rate in [19][20][21] was due to high operation frequency for [19,20] and high processing rate [21]. The results in [21] is only 1-D architecture; thus, high hardware efficiency can be achieved. Based on this criterion, the proposed design has the highest hardware efficiency, as shown in Table III. 6.…”

Section: Comparison and Discussionmentioning

confidence: 99%

“…This architecture implemented on TSMC 65 nm runs at a clock frequency of 500 MHz and achieves throughput of 1990 megapixel/s. The architecture implemented 32 coefficients per cycle to achieve high throughput of 1.6 gigapixels/s [21]. To overcome the difficulties involved in comparing different designs with regard to speed and area overhead, we defined hardware efficiency as throughput per gate.…”

Section: Comparison and Discussionmentioning

confidence: 99%

“…The design in [11] utilizes about two computation paths to reach high throughput rate; however, large area cost resulting in the hardware efficiency is only 2.21. The high throughput rate in [19][20][21] was due to high operation frequency for [19,20] and high processing rate [21]. The results in [21] is only 1-D architecture; thus, high hardware efficiency can be achieved.…”

Section: Comparison and Discussionmentioning

confidence: 99%

See 3 more Smart Citations

High‐throughput IDCT architecture for high‐efficiency video coding (HEVC)

Chen

2017

Circuit Theory & Apps

View full text Add to dashboard Cite

This paper presents a hardware design capable of supporting high-efficiency video coding inverse discrete cosine transform (IDCT) with a 32 × 32 transform unit size, using a single 1-D IDCT core with transpose memory to reduce costs. The proposed 1-D IDCT core employs 16 computation paths for high throughput and is implemented using distributed arithmetic to facilitate the sharing of hardware resources. The proposed 1-D IDCT is capable of calculating 1-D and 2-D data simultaneously along 32 parallel paths. When implemented using Taiwan Semiconductor Manufacturing Company (TSMC) 40-nm CMOS technology, the proposed 2-D transform core provides throughput of 6.4 gigapixels/s with a gate count of 335 k. The results show that a superior hardware efficiency can be achieved in the proposed 32-point IDCT core compared with the existing works.

show abstract

Section: Comparison and Discussionmentioning

confidence: 99%

Section: Comparison and Discussionmentioning

confidence: 99%

Section: Comparison and Discussionmentioning

confidence: 99%

Section: Comparison and Discussionmentioning

confidence: 99%

See 2 more Smart Citations

High‐throughput IDCT architecture for high‐efficiency video coding (HEVC)

Chen

2017

Circuit Theory & Apps

View full text Add to dashboard Cite

show abstract

“…As adopted by several image encoding schemes [3], [27]- [29], the 2D DCT computation is performed by successive calls of the 1D DCT applied to the columns of the input 2D data, then to the rows of the resulting matrix. For blocks of size 8 × 8, sixteen calls of the 1D DCTs are required to furnish the 2D DCT.…”

Section: The 2d Dctmentioning

confidence: 99%

Computation of 2D 8×8 DCT Based on the Loeffler Factorization Using Algebraic Integer Encoding

Coelho

Nimmalapalli

Dimitrov

et al. 2018

IEEE Trans. Comput.

View full text Add to dashboard Cite

This paper proposes a computational method for 2D 8×8 DCT based on algebraic integers. The proposed algorithm is based on the Loeffler 1D DCT algorithm, and is shown to operate with exact computation-i.e., error-free arithmetic-up to the final reconstruction step (FRS). The proposed algebraic integer architecture maintains error-free computations until an entire block of DCT coefficients having size 8×8 is computed, unlike algorithms in the literature which claim to be error-free but in fact introduce arithmetic errors between the column-and row-wise 1D DCT stages in a 2D DCT operation. Fast algorithms are proposed for the final reconstruction step employing two approaches, namely, the expansion factor and dyadic approximation. A digital architecture is also proposed for a particular FRS algorithm, and is implemented on an FPGA platform for on-chip verification. The FPGA implementation operates at 360 MHz, and is capable of a real-time throughput of 3.6 · 10 8 2D DCTs of size 8×8 every second, with corresponding pixel rate of 2.3 · 10 10 pixels per second. The digital architecture is synthesized using 180 nm CMOS standard cells and shows a chip area of 7.41 mm 2 . The CMOS design is predicted to operate at 893 MHz clock frequency, at a dynamic power consumption 13.22 mW/MHz · V 2 sup .

show abstract