In this paper, an efficient algorithm for concurrent computation of two real multiplications and/or two real additions usually required for high-throughput image and video coding applications is described. The proposed algorithm is mapped onto a novel concurrent dual multiplierdual adder cell based on carry-save 4:2 compressors. A detailed performance analysis of the the proposed cell shows reductions ranging from 15% to 60% in the computation time and area when compared with the conventional ]processing elements making it highly attractive for VLSI im,plementation.
I. INTRODUCI'IONIn recent years, there has been a great deal of interest in the application of multi-dimensional (notably 1-and 2-D) DSP functions: . FIR/IIR filtering/correlation/DCT/matirixmatrix multiplication etc., for solving many problems in the areas of image transformation, speech processing, echo cancellation, image processing for video applications, CAM, medical imaging etc. [l]. Several application specific systolic array processors (SAPs) and SIMD processors derived through block/state-space/signal flow-graph/ transfer fuinction models are reported in the literature for an efftcient implementation of these computationally intensive DSP functions [2-61. Processing elements (PES) comprising two or three real multipliers and/or two adder modules form the fundamental computing blocks of the SAPs [2,3] and lthe SIMD processors [4-61 resulting in excessive hardware camplexity and increased computation time.The multi-purpose inner-product processor cell which has been proposed in [7] is aimed at an efficient compuitation of the above intended DSP functions. However, it incorporates separate Booth recoded multipliers along with a chain of redundant arithmetic based adder modules for summing the resulting carry-save results from the multipliers' outputs followed finally by a fast adder module for redundant to binary arithmetic conversion. The use of this approach offers a speed-up in the computation time when compared with its earlier counterparts, however, at the expense of increased area and routing complexities. Recently, a novel multiplier-accumulator cell based on modified Booth's algorithm (MBA) is proposed in [8]. However, it is capable of performing concurrent computation of only one real multiplication and one adcJition/subtractiom operating on signed 2's complement operands [8].In this paper, we propose a novel computational primitive: concurrent dual multiplier-dual adder cell (CNDMDAC) architecture based on carry-save 4:2 compressors in order to perform concurrent computation of two real-multiplications and/or two real full-precision accumulations. As a result, approximate reductions ranging fiom 15% to 60% are obtained in the computation time and the area along with reduced number of interconnections when compared with the conventional PES. To the best of the authors' knowledge, the CNDMDAC is the first of its kind reported in the literature.
II. PROPOSED CONCURRENT VISI ARCHJTECTURE
Algorithm FormulationWhile the CNDMDAC cell is flexi...