2017
DOI: 10.1002/cta.2376
|View full text |Cite
|
Sign up to set email alerts
|

High‐throughput IDCT architecture for high‐efficiency video coding (HEVC)

Abstract: This paper presents a hardware design capable of supporting high-efficiency video coding inverse discrete cosine transform (IDCT) with a 32 × 32 transform unit size, using a single 1-D IDCT core with transpose memory to reduce costs. The proposed 1-D IDCT core employs 16 computation paths for high throughput and is implemented using distributed arithmetic to facilitate the sharing of hardware resources. The proposed 1-D IDCT is capable of calculating 1-D and 2-D data simultaneously along 32 parallel paths. Whe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 21 publications
(61 reference statements)
0
6
0
Order By: Relevance
“…Compared to the multiple computation path IDCT [19], the proposed 2-D IDCT core is composed of one 1-D transform core and one transposed memory (TMEM) to achieve a small-area design. The 1-D IDCT core utilizes the proposed data shared in the time scheme such that the throughput rate can be maintained the same as the operation frequency.…”
Section: Proposed Architecturementioning
confidence: 99%
See 3 more Smart Citations
“…Compared to the multiple computation path IDCT [19], the proposed 2-D IDCT core is composed of one 1-D transform core and one transposed memory (TMEM) to achieve a small-area design. The 1-D IDCT core utilizes the proposed data shared in the time scheme such that the throughput rate can be maintained the same as the operation frequency.…”
Section: Proposed Architecturementioning
confidence: 99%
“…However, the design only supports the 32-point HEVC inverse transform, which is insufficient for HEVC application. An ultrahigh-throughput design was presented in [19]. The 16 parallel computation streams achieved a throughput rate of 6.4 GP/s for supporting multiple trans-form dimensions when implemented into 40-nm CMOS technology.…”
Section: Comparison With Existing Studiesmentioning
confidence: 99%
See 2 more Smart Citations
“…T-memory enables single-cycle to write and read access in both row and column dimensions to efficiently enhance on-chip learning. Then, in the highefficiency video coding inverse discrete cosine transform (IDCT), the T-memory was developed to reduce costs and to improve the throughput [2]. In addition, the popular convolution neural networks (CNNs) has employed the T-memory to support efficient data transformation and data reuse [3].…”
Section: Introductionmentioning
confidence: 99%