An energy-efficient network-on-chip (NoC) is presented for possible application to high-performance system-onchip (SoC) design. It incorporates heterogeneous intellectual properties (IPs) such as multiple RISCs and SRAMs, a reconfigurable logic array, an off-chip gateway, and a 1.6-GHz phase-locked loop (PLL). Its hierarchically-star-connected on-chip network provides the integrated IPs, which operate at different clock frequencies, with packet-switched serial-communication infrastructure. Various low-power techniques such as low-swing signaling, partially activated crossbar, serial link coding, and clock frequency scaling are devised, and applied to achieve the power-efficient on-chip communications. The 5 5 mm 2 chip containing all the above features is fabricated by 0.18-m CMOS process and successfully measured and demonstrated on a system evaluation board where multimedia applications run. The fabricated chip can deliver 11.2-GB/s aggregated bandwidth at 1.6-GHz signaling frequency. The chip consumes 160 mW and the on-chip network dissipates less than 51 mW.
SoC design is composed of two major parts; the design of computing cores and their communication architecture. As the die sizes and the number of subsystems on a chip increase, power consumed by the interconnection structures, including clocks, takes significant portion of the overall power-budget. This calls for techniques to reduce the energy consumed in on-chip communication while satisfying quality of service (QoS) requirements such as bandwidth, latency, or reliability. Recently, on-chip networks (OCN) have been studied actively to address these communication problems [1][2][3][4], but their implementations are not energy-efficient so far [1] [3]. In this paper, we report successful implementation of a 51mW 1.6GHz hierarchical star-connected on-chip network supporting 11.2GB/s bandwidth with various low-power circuit techniques.The star-topology guarantees constant and minimum switch hop counts between every communicating IP. However, 1-level flat star-topology [1] as shown in Fig. 8.2.1a results in a number of capacitive global wires that may cause long latency and large power dissipation. Figure 8.2.1b shows a hierarchical star-connected SoC which is composed of several clusters of tightly-connected IPs for their communication locality. Intra-cluster local links provide high-bandwidth with shorter latency and less energy consumption, and inter-cluster global links show higher link utilization by link-sharing. Figure 8.2.2 shows the OCN-based SoC platform applicable to low-power mobile devices [5]. The OCN has two separate networks: forward networks and backward networks that configure the Master-to-Slave path and Slave-to-Master path, respectively [1]. To reduce the area of OCN, 100MHz packets are serialized by Up-Sampler with a 1.6GHz network clock before transmission and then deserialized by Down-Sampler upon arrival. To deserialize a packet without a globally synchronized clock, a strobe signal is transmitted together with the packet. The strobe and the packet experience the same wire-delay without skew. A forward network packet consists of 32b address, 32b data, and 16b header fields while a backward one does not have the address field. The packet header generated by a network interface contains routing information, a type of burst length, a read/write command, an acknowledgment request, and a QoS level.The global link connecting clusters in the 2nd level star-topology is usually several millimeters long. By using overdrivers [6], clocked sense-amplifiers and twisted differential signaling, packets are transmitted reliably with less than 600mV swing. The sizes of a tranceiver and the overdrive voltage are chosen to obtain a 200mV separation at the receiver end as shown in Fig. 8.2.3. A 5mm global link of 1.6µm wire-pitch can carry a packet at 1.6GHz with 320ps wire-delay and consumes 35pJ/packet (= 0.35pJ/bit). In contrary, a full-swing link consumes up to 3x more power and additional area of repeaters.A crossbar switch for intra-cluster packets performs buffer-less cut-through switching to minimize ...
Abstract-With the increasing complexity of system-on-chip, Networks on Chip (NoC) of multi-hop switching require low end-to-end latency for QoS guarantee. An arbitration lookahead scheme is proposed to reduce the end-to-end packet latency in the NoC. Its packet arbitration at each switch is completed in a few cycle advance of the packet arrival. As a result, a packet bypasses the switch without the latency of input queuing and arbitration. This scheme is analyzed on 4x4 mesh topology. Maximum 65% and average 26% latency reduction are obtained under random traffic. Latency reduction up to 36% is achieved under multimedia traffic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.