This paper introduces a method for extracting the core of a Residue Number System (RNS) number within the RNS, this affording a new method for scaling RNS numbers.Suppose an RNS comprises a set of co-prime moduli, m i , with ∏m i = M. This paper describes a method for approximately scaling such an RNS number by a subset of the moduli, ∏m j = M J ≈ √M, with the characteristic that all computations are performed using the original moduli and one other non-maintained short wordlength modulus.
Background and MotivationThe Residue Number System (RNS) has great potential for accelerating arithmetic operations, achieved by breaking operands into several smaller residues and operating on the residues independently and in parallel. RNS implementations were studied extensively in the 1970's, particularly for DSP applications [1], and led to Inmos' production of an RNS 2-D convolver chip in 1989 [2]. However, wider take-up of RNS for DSP was limited because of a number of fundamental difficulties: • Conversion to binary representation from RNS is difficult (the inverse operation is simple) • Direct magnitude comparison and sign determination of RNS numbers is impossible • Square root operations are not available, and division operations, although available [3], are not practical due to their complexity These difficulties place major constraints on the possible applications of RNS arithmetic.Recently, however, DSP chips using RNS have enjoyed something of a renaissance for a variety of reasons:• They offer high-performance implementations of arithmetic-intensive applications at reduced power supply voltages, important for mobile and wearable computer and communication systems [4] • They avoid lengthy on-chip interconnects, which now represent the major constraint on the realisation of high-performance digital VLSI circuits [5] • They afford hardware-efficient complex multipliers ("QRNS multiplication") comprising two independent multiplications instead of four multiplications and two additions [1]• The component arithmetic operations in an RNS implementation can, without exception, be reduced to short adders and small look-up tables [1] All the items in the above list are applicable to custom VLSI implementations, and the last two also apply advantageously to FPGA implementations [6,7]. Recent industrial interest in RNS confirms the existence and scale of problems faced in implementing DSP algorithms in digital microelectronic fabrics at high clock rates but with low power consumption. For example, reference [8] describes an FIR filter in RNS designed by Texas Instruments because of its low-power capability, and reference [9] discusses a general-purpose DSP engine developed by Siemens that incorporates an RNS vector processor with a considerably higher data processing bandwidth than its binary counterpart.The fundamental difficulties with RNS arithmetic listed earlier have been overcome to some extent by recent innovations in RNS theory. For example, the core function has been shown to be advantageous in converting an RNS nu...