Roger Porto scite author profile

Bampi

2009

Amongst the video compression standards, the latest one is the H.264/AVC [1]. This standard reaches the highest compression rates when compared to the previous standards. On the other hand, it has a high computational complexity. This high computational complexity makes it difficult the development of software applications running in a current processor when high definitions videos are considered. Thus, hardware implementations become essential. Addressing the hardware architectures, this work presents the architectural design for the variable block size motion estimation (VBSME) defined in the H.264/AVC standard. This architecture is based on full search motion estimation algorithm and SAD calculation. This architecture is able to produce the 41 motion vectors within a macroblock as specified in the standard. The implementation of this architecture was based on standard cell methodology in 0.18μm CMOS technology. The architecture reached a throughput of 34 1080HD frames per second.

show abstract

High Throughput Multitransform and Multiparallelism IP for H.264/AVC Video Compression Standard

Güntzel

et al.

This paper presents the design of a high The forward transform block uses three different throughput multitransform and multiparallelism IP for transforms, depending on the type of input data. This H.264/AVC standard. This solution supports the five transforms are 4x4 FDCT, 4x4 forward Hadamard and 2x2 H.264/AVC transforms and it supports five different levels of forward Hadamard. The inverse transform block is also parallelism. The proposed architecture were described in formed by three different transforms: 4x4 IDCT, 4x4 inverse VHDL and synthesized to Altera Stratix and Xilinx Virtex-It Hadamard and 2x2 inverse Hadamard. The 2x2 inverse and Pro FPGAs and to TSMC 0.35gm standard cells. The forward transforms are identical. multitransform and multiparallelism architecture mapped to FPGAs could process from 124 millions to 3.2 billions of The Hadamard transforms are used to explore a residual samples per second, depending on the parallelism level correlation between the results of the DCT transform when selected. The standard cells version could process from 218.7 color samples are processed or when luma samples predicted millions to 3.5 billions of samples per second. These results from intra 16xl6 mode are processed [3]. indicate that the proposed solution presents a high flexibility All H.264/AVC transforms use just integer arithmetic to and that this solution is able to be used in various H.264/AVC * a codecs with different performance requirements. The avoid the mismatch between forward and inverse transforms performance results of all experiments realized indicated that and to allow efficient hardware implementations [4]. this architecture is able to be used in high definition This work presents the architectural design of a applications, like HDTV. multitransform and multiparallelism IP that is able to calculate the five H.264/AVC transforms and that supports I. INTRODUCTION five different parallelism levels. Other characteristic of the proposed IP is that the number of input data bits is definedThe H.264/AVC (as know as MeG-4 lpart 10 h1e]e through an input parameter. This solution is highly flexible a video coding standard that has been developed to achieve and is able to be used in H.264/AVC codec designs with a significant improvements, i the compression performance, large spectrum of performance requirements.Over the existing standards. The second section of this paper presents some relatedThe main blocks of a H.264/AVC encoder are the motion wok.Tescintrepsnsth cre fte estimation, the motion compensation, the intra prediction, the multitransform and its characteristics. Section four presents loop filter the entropy coder, the forward and inverse the input and output parallelism controller. The fifth section quantization and the forward and inverse transforms. The presents the synthesis results for Altera and Xilinx FPGAs H.264/AVC decoder iS formed by entropy decoder, motion peet h ytei eut o leaadXln PA Hom264/aVCtdeodr isa formedicbyoentroopy decer, mion and for standard cell technologies. The conc...

show abstract

A FPGA Based Design of a Multiplierless and Fully Pipelined JPEG Compressor

Porto²,

Bampi

et al.

This paper presents the design and implementation of a multiplierless JPEG compressor for gray scale images. The modules of this architecture were fully pipelined and targeted to FPGA device implementation. The designed architectures are detailed in this paper and they were described in VHDL, simulated and physically mapped to Altera Flex10KE FPGAs. The JPEG compressor pipeline has a minimum latency of 238 clock cycles, given the full modular pipeline depth. The minimum compressor period is 26.6ns and the compressor is able to process 37.6 millions of pixels per second. For example, the compressor can process a 640x480 pixels still image in 8.2ms, reaching a maximum processing rate of 122.4 frames per second.

show abstract

High Throughput FPGA Based Architecture for H. 264/AVC Inverse Transforms and Quantization

Güntzel

et al. 2006

Design space exploration on the H.264 4/spl times/4 Hadamard transform

Silva

et al. 2005