Abstract:Integrating networks-on-chip (NoCs) on FPGAs can improve device scalability and facilitate design by abstracting communication and simplifying timing closure, not only between modules in the FPGA fabric but also with large "hard" blocks such as high-speed I/O interfaces. We propose mixed and hard NoCs that add less than 1% area to large FPGAs and run 5-6× faster than the soft NoC equivalent. A detailed power analysis, per NoC component, shows that routers consume 14× less power when implemented hard compared t… Show more
“…Kuon and Rose [6] compared between FPGA and ASIC using different benchmarks. Abdelfattah and Betz, made that comparison [7], [8] on router's sub-modules level using ASIC-specific router. In [9] a comparative review between FPGA-specific NoC and ASIC-specific one has been [4] covered.…”
Section: Introductionmentioning
confidence: 99%
“…We make a similar comparison to [7] and [8] for SoftHard utilization using FPGA-specific design CONNECT to provide design suggestions of which modules are suitable to be reconfigurable or to be harden, and give best design decisions for FPGA-specific router's sub-modules. Also to investigate whether the soft implementation of FPGA-specific NoC would give better results than ASIC-specific NoC or not.…”
Including Networks-on-Chip (NoCs) within FPGAs has become necessary to overcome the problems of point-topoint interconnect scheme. This will enable interfacing with high speed IOs and partial dynamic reconfiguration (PDR), reduce the compile time and improve the system performance. We compared FPGA-specific NoC components on soft and hard implementations and analyzed the efficiency gap between the two technologies to get design constraints in this space. Input module that includes memory buffers, implemented using block RAMs (BRAMs), is the module with the smallest gaps: 1.8x area, 2.9x delay and 5.3x power. Switch has the largest gap: 90x area, 7x delay and 53x power. If the router is totally hard implemented, this will save 9x area, 3.7x delay and 12x power at the expense of no flexibility (reconfigurability). By comparing our results with same flow on ASIC-specific router, we show that using FPGAspecific NoCs design improves the utility by a factor of 3x in area with a slight increase in the delay.
“…Kuon and Rose [6] compared between FPGA and ASIC using different benchmarks. Abdelfattah and Betz, made that comparison [7], [8] on router's sub-modules level using ASIC-specific router. In [9] a comparative review between FPGA-specific NoC and ASIC-specific one has been [4] covered.…”
Section: Introductionmentioning
confidence: 99%
“…We make a similar comparison to [7] and [8] for SoftHard utilization using FPGA-specific design CONNECT to provide design suggestions of which modules are suitable to be reconfigurable or to be harden, and give best design decisions for FPGA-specific router's sub-modules. Also to investigate whether the soft implementation of FPGA-specific NoC would give better results than ASIC-specific NoC or not.…”
Including Networks-on-Chip (NoCs) within FPGAs has become necessary to overcome the problems of point-topoint interconnect scheme. This will enable interfacing with high speed IOs and partial dynamic reconfiguration (PDR), reduce the compile time and improve the system performance. We compared FPGA-specific NoC components on soft and hard implementations and analyzed the efficiency gap between the two technologies to get design constraints in this space. Input module that includes memory buffers, implemented using block RAMs (BRAMs), is the module with the smallest gaps: 1.8x area, 2.9x delay and 5.3x power. Switch has the largest gap: 90x area, 7x delay and 53x power. If the router is totally hard implemented, this will save 9x area, 3.7x delay and 12x power at the expense of no flexibility (reconfigurability). By comparing our results with same flow on ASIC-specific router, we show that using FPGAspecific NoCs design improves the utility by a factor of 3x in area with a slight increase in the delay.
“…Since 10 6-LUTs fit into Stratix LABs, we also assume that there are always exactly 10 LUTs per LAB. 4 Our HNS switch configurations consume only 2.1% of the LUT area available, and little to no registers. When compared to the FPGA implementation of the MDN switch [19], the HNS consumes 6.9×-8.4× less area.…”
Section: Hardware Costmentioning
confidence: 99%
“…The hard NoC provides a fixed mesh topology and uses no programmable interconnect, thus easing the design and routing of the circuit logic and providing the fastest and most area-and power-efficient design [4]. The mixed NoC has the advantage of flexible, programmable topologies.…”
Communications systems make heavy use of FPGAs; their programmability allows system designers to keep up with emerging protocols and their high-speed transceivers enable high bandwidth designs. While FPGAs are extensively used for packet parsing, inspection and classification, they have seen less use as the switch fabric between network ports. However, recent work has proposed embedding a networkon-chip (NoC) as a new "hard" resource on FPGAs and we show that by properly leveraging such a NoC one can create a very efficient yet still highly programmable network switch.We compare a NoC-based 16×16 network switch for 10-Gigabit Ethernet traffic to a recent innovative FPGA-based switch fabric design. The NoC-based switch not only consumes 5.8× less logic area, but also reduces latency by 8.1×. We also show that using the FPGA's programmable interconnect to adjust the packet injection points into the NoC leads to significant performance improvements. A routing algorithm tailored to this application is shown to further improve switch performance and scalability. Overall, we show that an FPGA with a low-cost hard 64-node mesh NoC with 64-bit links can support a 16×16 switch with up to 948 Gbps in aggregate bandwidth, roughly matching the transceiver bandwidth on the latest FPGAs.
“…For example, consider 9 metal layers available in TSMC 65nm technology, 2 ⇠ 6 layers might be reserved for local wiring, clock and power; we only need 2 layers to freely route link wires -one for horizontal and one for vertical wires. Previous work [27] uses a 1.2um wide bit line (0.6um width with 0.6 spacing) to meet 1GHz timing. Therefore a typical 64-bit wide bidirectional link has a width of 153.6um.…”
Abstract-Network topology plays a vital role in chip design; it largely determines network cost (power and area) and significantly impacts communication performance in manycore architectures. Conventional topologies such as a 2D mesh have drawbacks including high diameter as the network scales and poor load balancing for the center nodes. We propose a methodology to design random topologies for on-chip networks. Random topologies provide better scalability in terms of network diameter and provide inherent load balancing. As a proof-of-concept for random on-chip topologies, we explore a novel set of networks -dodecs -and illustrate how they reduce network diameter with randomized low-radix router connections. While a 4 ⇥ 4 mesh has a diameter of 6, our dodec has a diameter of 4 with lower cost. By introducing randomness, dodec networks exhibit more uniform message latency. By using low-radix routers, dodec networks simplify the router microarchitecture and attain 20% area and 22% power reduction compared to mesh routers while delivering the same overall application performance for PARSEC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.