Memories (PAM) are a novel form of universal recon gurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a P AM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and inde nitely recongured into a large number of application-speci c circuits. PAMs o er a new mixture of hardware performance and software versatility. We review the important architectural features of PAMs, through the example of DECPeRLe-1, an experimental device built in 1992. PAM programming is presented, in contrast to classical gate-array and full custom circuit design. Our emphasis is on large, code-generated synchronous systems descriptions no compromise is made with regard to the performance of the target circuits. We exhibit a dozen applications where PAM technology proves superior, both in performance and cost, to every other existing technology, including supercomputers, massively parallel machines, and conventional custom hardware. The elds covered include computer arithmetic, cryptography, error correction, image analysis, stereo vision, video compression, sound synthesis, neural networks, high-energy physics, thermodynamics, biology and astronomy. At comparable cost, the computing power virtually available in a PAM exceeds that of conventional processors by a factor 10 to 1000, depending on the speci c application, in 1992. A technology shrink increases the performance gap between conventional processors and PAMs. By Noyce's law, we predict by h o w m uch the performance gap will widen with time. Keywords| Programmable Active Memory, P AM, recongurable system, eld-programmable gate array, FPGA.
We present some quantitative performance measurements for the computing power of Programmable Active Memories (PAM), as introduced by [BRV 89]. Based on Field Programmable Gate Array (FPGA) technology, the PAM is a universal hardware co-processor closely coupled to a standard host computer. The PAM can speed up many critical software applications running on the host, by executing part of the computations through a specific hardware design. The performance measurements presented are based on two PAM architectures and ten specific applications, drawn from arithmetics, algebra, geometry, physics, biology, audio and video. Each of these PAM designs proves as fast as any reported hardware or super-computer for the corresponding application. In cases where we could bring some genuine algorithmic innovation into the design process, the PAM has proved an order of magnitude faster than any previously existing system (see [SBV 91] and [S 92]). PAM conceptLike any RAM memory module, a PAM is attached to the system bus of a host computer. The processor can write into, and read from the PAM. Unlike a KAM, the PAM processes data between writeand read instructions. The specific processing is determined by the content ofits configlration memory. The host can change the PAM conl~guration by downloading a new design, within a few milliseconds.We speed up a specific software application running on the host, by executing its critical innerloop through an appropriate hardware design downloaded into the PAM. Ten examples of such designs and applications are presented below. Equipped with these ten designs, our PAM~Host system is ready to be compared with more conventional solutions (specific hardware, super-computer software) designed for processing the application.Due to the great variety of the operations required by each application, quantitative performance comparison between existing computer architectures is a challenging art (see [HP 00]). Traditional measurement units include Gips (billion of instructions per second, i.e. 1000 MIPS), GopJ (billion of fixed-point arithmetic operations/see) and Gfiop8 (billion of floating-point operations/see). None of these measures is particularly well-defined (which instructions? how many bits?) or relevant for every application. They are particularly ill-adapted to comparing the performance of hardware algorithms. Even the most appropriate Gbop~ measure (billion of boolean operations/see) fails to
We present various experiments in Hardware/Software design tradeoffs met in speeding up long integer multiplications. This work spans over a year, with more than 12 different hardware designs tested and measured.To implement these designs, we rely on our PAM (for ProgrammableActive Memory, see [BRV]) technology which provides us with a 50 millisecond turn-around time silicon foundry for implementing up to 50K gate logic designs fully equipped with fast local RAM and host bus interface, First, we demonstrate how a simple hardware 512 bits integer multiplier coupled with a low end workstation host yields performance on long arithmetic superior to that of the fastest computers for which we could obtain actual benchmark figures.Second, we specialize this hardware in order to speed-up one specific application of long integer arithmetic, namely RivestShamir-Adleman public-key cryptography [RSA]. We demonstrate how a single host driving 3 differendy configured PAM boards delivers RSA encryption and decryption faster than 200Kbits/sec for 512 bits keys. This beats the best currently working VLSI specially built for RSA by one order of magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.