A 4K×2K H.265/HEVC video codec chip is fabricated in a 28nm CMOS process with a core area of 2.16mm 2 . This LSI chip integrates a dual-standard (H.265 and H.264) video codec and a series of prevalent (VC-1, WMV-7/8/9, VP-6/8, AVS, RM-8/9/10, MPEG-2/4) decoders into a single chip. It contains 3,558K logic gates and 308KB of internal SRAM. Moreover, it simplifies intra/inter-ratedistortion optimization (RDO) processes and reduces external bandwidth via line-store SRAM pool (LSSP) and data-bus translation (DBT) techniques. For smartphone applications, it completes real-time HEVC encoding and decoding with 4096×2160 resolution and 30fps, and consumes 126.73mW (0.5nJ/pixel) of core power dissipation at 0.9V, at 494MHz (encoding) and 350MHz (decoding). 1080HD and 720HD resolutions are reported as well. The chip features are summarized in Fig. 18.6.1.
A first dual-standard video encoder and decoder LSI providing VP8 (i.e. video format of WebM project for use of web's video) or H.264/AVC video recording and playback simultaneously is implemented with 28nm CMOS and occupies 1.94mm 2 of core area. Several area-efficient techniques are realized, leading to 43.6% of area reduction. A new rate control is designed to facilitate the adaptation of video data and frame rates for network services. Two fast algorithms and new bool encoder/decoder are proposed to enhance power efficiency. This chip consumes 28.15mW and 10.02mW of VP8 encoder and decoder average power for 1080p@30fps at 0.9V, respectively.
Abstract-This paper presents efficient VLSI architectures of the shape-adaptive discrete cosine transform (SA-DCT) and its inverse transform (SA-IDCT) for MPEG-4. Two of the challenges encountered during the exploitation of more efficient architectures for the SA-DCT and SA-IDCT are addressed. One challenge is to handle the architectural irregularity due to the shape-adaptive nature. The other one is to provide acceptable throughput using minimal hardware. In the algorithm-level optimization, this work exploits the numerical properties found in the transform matrices of various lengths, and derives a fine-grained zero-skipping scheme for the IDCT which can perform 22.6% more zero-skipping than the common vector-based coarse-grained zero-skipping scheme does. In the architecture-level design, the 1-D variable-length DCT/IDCT architectures designed on the basis of the numerical properties are proposed. An auto-aligned transpose memory that aligns the data of different lengths is also incorporated. In addition, a zero-index table is also included in the transpose memory to support the fine-grained zero-skipping in the SA-IDCT. The synthesized designs of the SA-DCT and SA-IDCT are implemented using UMC 0.18-m technology. The SA-DCT architecture has 26 635 gates, and its average cycle-throughput is 0.66 pixels/cycle, which is comparable to other proposed architectures. On the other hand, the SA-IDCT architecture has 29 960 gates, and its cycle-throughput is 6.42 pixels/cycle. While decoding for CIF@30FPS, the SA-IDCT is clocked at 0.7 MHz, and the power consumption is 0.14 mW. Both the throughput and power consumption of the proposed SA-IDCT architecture are an order better than those of the existing SA-IDCT architectures.Index Terms-Discrete cosine transform (DCT), MPEG-4, video coding, VLSI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.