Presented in this paper are novel circuits and architecture for residue arithmetic. These circuits are aimed towards fast and area-efficient single-chip implementation of digital signal processors. This has been achieved by following an algorithmic approach as opposed to the conventional look-up table approach. As a result, substantial area savings have resulted. The circuits include the residue adder, residue multiplier, binary-to-residue converter, and residue-to-binary converter. Based on these circuits, a prototype single-chip, 3 X3, finite impulse response (FIR), variable coefficient, linear-phase filter has been designed and fabricated in standard 2-pm CMOS technology. The filter has a pipelined architecture to increase the throughput. Testability in the form of scan-path registers has been incorporated. An interesting feature of this unique combination of residue arithmetic and scan-path testing is the possible trade-off available between the precision of the filter coefficients and the image data. The chip has a die size of 6.6 X4.2 mm', dissipates 220 mW of power, and is synchronized with a 180-11s clock cycle.