This paper describes the Transmogrifier-2, a second generation multi-FPGA system. The largest version of the system will comprise 16 boards that each contain two Altera 10K50 FPGAs, four I-cube interconnect chips, and up to 8 Mbytes of memory. The inter-FPGA routing architecture of the TM-2 uses a novel interconnect structure, a non-uniform partial crossbar, that provides a constant delay between any two FPGAs in the system. The TM-2 architecture is modular and scalable, meaning that various sized systems can be constructed from the same board, while maintaining routability and the constant delay feature. Other features include a system-level programmable clock that allows single-cycle access to off-chip memory, and programmable clock waveforms with resolution to 10ns. The first Transmogrifier-2 boards have been manufactured and are functional. They have recently been used successfully in some simple graphics acceleration applications.
IntroductionContinuing advances in the density and speed of FPGAs have made them effective implementation vehicles for increasingly complex systems. Nevertheless, contemporary FPGAs lag semi-custom ASICs by a factor of 10 or more in density, and lack system-level facilities, such as very large RAM and clocks, to implement large systems. Multi-FPGA field-programmable systems can be used not only to prototype these larger designs, but also as field-configurable compute accelerators. A moderate number of field-programmable systems have been described, and can be roughly classified as emulation systems, or custom compute engines. Emulation systems tend to be targeted at a wide range of applications, but suffer from low clock rates, while custom compute engines tend to have architectures that constrain the applications to those that fit the computational model offered by the hardware. The primary motivation for the Transmogrifier-2 (or TM-2, for short) is a flexible rapid prototyping system that offers high capacity, high clock rates, and is flexible enough to implement a wide variety of systems.We have previously constructed a small-scale rapid prototyping system, the Transmogrifier-1 [7]. It contained only 40K FPGA gates and 128KB RAM, and was only capable of implementing small systems. Further, constructed from a standard rapid prototyping board plus other components and software, the TM-1 was difficult to use and required that the user be physically present to operate the machine. A number of designs were successfully implemented on the TM-1, with our positive and negative experiences largely leading to the goals for the TM-2 project.
University of TorontoThe remainder of this paper describes the goals of the TM-2 project and the resulting architecture and design. Section 2 describes the goals of this project. Section 3 describes the routing architecture developed for the TM-2. In Section 4, the influence of the goals and technical constraints on the detailed design are described. The software development system for the TM-2 is described in Section 5, and Section 6 conclud...