In this paper, we present a case study of our chip prototype of a 16-node 4x4 mesh NoC fabricated in 45nm SOI CMOS that aims to simultaneously optimize energy-latency-throughput for unicasts, multicasts and broadcasts. We first define and analyze the theoretical limits of a mesh NoC in latency, throughput and energy, then describe how we approach these limits through a combination of microarchitecture and circuit techniques. Our 1.1V 1GHz NoC chip achieves 1-cycle router-and-link latency at each hop and energy-efficient router-level multicast support, delivering 892Gb/s (87.1% of the theoretical bandwidth limit) at 531.4mW for a mixed traffic of unicasts and broadcasts. Through this fabrication, we derive insights that help guide our research, and we believe, will also be useful to the NoC and multicore research community.
Abstract-As technology scales, SoCs are increasing in core counts, leading to the need for scalable NoCs to interconnect the multiple cores on the chip. Given aggressive SoC design targets, NoCs have to deliver low latency, high bandwidth, at low power and area overheads. In this paper, we propose Single-cycle Multihop Asynchronous Repeated Traversal (SMART) NoC, a NoC that reconfigures and tailors a generic mesh topology for SoC applications at runtime. The heart of our SMART NoC is a novel low-swing clockless repeated link circuit embedded within the router crossbars, that allows packets to potentially bypass all the way from source to destination core within a single clock cycle, without being latched at any intermediate router. Our clockless repeater link has been proven in silicon in 45nm SOI. Results show that at 2GHz, we can traverse 8 mm within a single cycle, i.e. 8 hops with 1 mm cores. We implement the SMART NoC to layout and show that SMART NoC gives 60% latency savings, and 2.2X power savings compared to a baseline mesh NoC.
Abstract-Mesh NoCs are the most widely-used fabric in highperformance many-core chips today. They are, however, becoming increasingly power-constrained with the higher on-chip bandwidth requirements of high-performance SoCs. In particular, the physical datapath of a mesh NoC consumes significant energy. Low-swing signaling circuit techniques can substantially reduce the NoC datapath energy, but existing low-swing circuits involve huge area footprints, unreliable signaling or considerable system overheads such as an additional supply voltage, so embedding them into a mesh datapath is not attractive. In this paper, we propose a novel low-swing signaling circuit, a self-resetting logic repeater, to meet these design challenges. The SRLR enables single-ended low-swing pulses to be asynchronously repeated, and hence, consumes less energy than differential, clocked low-swing signaling. To mitigate global process variations while delivering high energy efficiency, three circuit techniques are incorporated. Fabricated in 45nm SOI CMOS, our 10mm SRLR-based lowswing datapath achieves 6.83Gb/s/μm bandwidth density with 40.4fJ/bit/mm energy at 4.1Gb/s data rate at 0.8V.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.