A 36-bit balanced moduli MAC architecture

Preethy, A.P.; Radhakrishnan, D.

doi:10.1109/mwscas.1999.867285

Cited by 8 publications

(9 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our new MAC architecture, which is described in detail in [2] uses index-transform based approach. The relatively prime moduli in any arbitrary moduli set take any of the three forms p, 2 m , and p m , or a factorable modulus with any of these factors, where p is prime and m is any integer.…”

Section: The Multiply-accumulate Unitmentioning

confidence: 99%

A high performance RNS multiply-accumulate unit

Preethy

Radhakrishnan

Omondi

2001

Proceedings of the 11th Great Lakes Symposium on VLSI

View full text Add to dashboard Cite

show abstract

Section: The Multiply-accumulate Unitmentioning

confidence: 99%

A high performance RNS multiply-accumulate unit

Preethy

Radhakrishnan

Omondi

2001

Proceedings of the 11th Great Lakes Symposium on VLSI

View full text Add to dashboard Cite

show abstract

“…That is, we will compute C J (n) wholly within the RNS. n is represented as (5,8,3,4,14,17) by this set of moduli. First, compute C J (n) moduli 7, 17, and 23, and C K (n) moduli 11, 13, and 19 using (15) and (17):…”

Section: Rns Scaling Methodsmentioning

confidence: 99%

“…• They offer high-performance implementations of arithmetic-intensive applications at reduced power supply voltages, important for mobile and wearable computer and communication systems [4] • They avoid lengthy on-chip interconnects, which now represent the major constraint on the realisation of high-performance digital VLSI circuits [5] • They afford hardware-efficient complex multipliers ("QRNS multiplication") comprising two independent multiplications instead of four multiplications and two additions [1] • The component arithmetic operations in an RNS implementation can, without exception, be reduced to short adders and small look-up tables [1] All the items in the above list are applicable to custom VLSI implementations, and the last two also apply advantageously to FPGA implementations [6,7]. Recent industrial interest in RNS confirms the existence and scale of problems faced in implementing DSP algorithms in digital microelectronic fabrics at high clock rates but with low power consumption.…”

Section: Background and Motivationmentioning

confidence: 99%

“…Finally, add ∆C(n) to the C K (n) values to obtain the remaining scaled moduli: C J (n) mod 11 = C K (n) + ∆C(n) mod 11 = 5+8 mod 11 = 2 C J (n) mod 13 = C K (n) + ∆C(n) mod 13 = 0+8 mod 13 = 8 C J (n) mod 19 = C K (n) + ∆C(n) mod 19 =11+8 mod 19 = 0 Hence, the RNS value of n = 1,859,107 (or (5,8,3,4,14,17) in RNS format) after being approximately scaled by 2717 is C J (n) = (5, 2, 8, 4, 0, 17).…”

Section: Worked Example Of Proposed Rns Scaling Algorithmmentioning

confidence: 99%

“…as C(n) ≈ C(M)) because the core is being extracted effectively over modulo M, not modulo C(M). Consequently, there is no ambiguity due to aliassing arising from equation (4). However, aliassing can occur for values of n ≈ 0 (i.e.…”

Section: Examples Of Ambiguitymentioning

confidence: 99%

See 2 more Smart Citations

Scaling an RNS number using the core function

Burgess

16th IEEE Symposium on Computer Arithmetic, 2003. Proceedings.

View full text Add to dashboard Cite

This paper introduces a method for extracting the core of a Residue Number System (RNS) number within the RNS, this affording a new method for scaling RNS numbers.Suppose an RNS comprises a set of co-prime moduli, m i , with ∏m i = M. This paper describes a method for approximately scaling such an RNS number by a subset of the moduli, ∏m j = M J ≈ √M, with the characteristic that all computations are performed using the original moduli and one other non-maintained short wordlength modulus. Background and MotivationThe Residue Number System (RNS) has great potential for accelerating arithmetic operations, achieved by breaking operands into several smaller residues and operating on the residues independently and in parallel. RNS implementations were studied extensively in the 1970's, particularly for DSP applications [1], and led to Inmos' production of an RNS 2-D convolver chip in 1989 [2]. However, wider take-up of RNS for DSP was limited because of a number of fundamental difficulties: • Conversion to binary representation from RNS is difficult (the inverse operation is simple) • Direct magnitude comparison and sign determination of RNS numbers is impossible • Square root operations are not available, and division operations, although available [3], are not practical due to their complexity These difficulties place major constraints on the possible applications of RNS arithmetic.Recently, however, DSP chips using RNS have enjoyed something of a renaissance for a variety of reasons:• They offer high-performance implementations of arithmetic-intensive applications at reduced power supply voltages, important for mobile and wearable computer and communication systems [4] • They avoid lengthy on-chip interconnects, which now represent the major constraint on the realisation of high-performance digital VLSI circuits [5] • They afford hardware-efficient complex multipliers ("QRNS multiplication") comprising two independent multiplications instead of four multiplications and two additions [1]• The component arithmetic operations in an RNS implementation can, without exception, be reduced to short adders and small look-up tables [1] All the items in the above list are applicable to custom VLSI implementations, and the last two also apply advantageously to FPGA implementations [6,7]. Recent industrial interest in RNS confirms the existence and scale of problems faced in implementing DSP algorithms in digital microelectronic fabrics at high clock rates but with low power consumption. For example, reference [8] describes an FIR filter in RNS designed by Texas Instruments because of its low-power capability, and reference [9] discusses a general-purpose DSP engine developed by Siemens that incorporates an RNS vector processor with a considerably higher data processing bandwidth than its binary counterpart.The fundamental difficulties with RNS arithmetic listed earlier have been overcome to some extent by recent innovations in RNS theory. For example, the core function has been shown to be advantageous in converting an RNS nu...

show abstract