Abstract²In this paper, an unified hardware architecture for the complete set of transforms in H.264/AVC codec is presented. This architecture has been mapped into 2-D 4x4 forward/inverse transforms, 2-D 4x4/2x2 Hadamard transforms, and 1-D 8x8 forward/inverse transforms resulting in 31 sub/adders, 7 adders, 6 subtractors, 34 shifter, 4 multiplexer, and 16 registers. The architecture calculates 16 inputs and 8 outputs in parallel for 4x4 integer forward/inverse transforms, and 8 inputs and 8 outputs in parallel for 8x8 integer forward/inverse transforms by our proposed fast 4-step process. The register array is not necessary for transpose operations of 4x4 forward/inverse and 4x4/2x2 Hadamard transforms. With 8 pixels/cycle throughput, the proposed design can complete the computation in 50 clock cycles with 8x8 and 4x4 transforms for one macroblock in 4:2:0 format.