The paper describes the design of 2D-discrete cosine transform (DCT) which is widely used in image and video compression algorithms. The objective of this paper is to design a fully parallel distributed arithmetic (DA) architecture for 2Ddimensional DCT to be implemented on field programmable gate array (FPGA). DCT requires large amount of mathematical computations including multiplications and accumulations. The multipliers consume increased power and area; hence multipliers are completely discarded in the proposed design. Distributed arithmetic is a method of modification at bit stream for sum of product or vector dot product to hide the multiplications. DA is very much suitable for FPGA designs as it reduces the size of a multiply and accumulate hardware. The speed is increased in the proposed design with the fully parallel approach. In this work, existing DA architecture for 2D-DCT and the proposed area efficient fully parallel DA architecture for 2D-DCT are realized. The simulation and synthesis is performed using Xilinx ISE.Index Terms-2DDiscrete cosine transform, Parallel Architecture, Distributed Arithmetic, and FPGA.