Recent high performance IC design has been dominated by power density constraints. 3D integration increases device density even further, and these devices will not be usable without viable strategies to reduce power consumption. This paper proposes the use of near-threshold computing (NTC) to address this issue in a stacked 3D system. In NTC, cores are operated near the threshold voltage (~200mV above Vth) to optimally balance power and performance [1]. In Centip3De, we operate cores at 650mV, as opposed to the wear-out limited supply voltage of 1.5V. This improves measured energy efficiency by 5.1×. The dramatically lower power consumption of NTC makes it an attractive match for 3D design, which has limited power dissipation capabilities, but also has improved innate power and performance compared to 2D design.Due to higher leakage current in SRAMs compared to logic, memories reach their optimal energy/delay trade-off at higher voltages than cores: 870mV for SRAM and 670mV for logic in 130nm technology. Hence, SRAMs ideally operate at a higher voltage than cores, improving their speed. Centip3De uses this unique cache/core performance inversion by connecting four cores to each cache, where each cache operates at 4× the core frequency and communicates with the cores in a round-robin fashion. This configuration has the added advantage of automatically resolved coherence within the cluster which reduces coherency traffic and overhead.To address Amdahl's law, Centip3De allows some cores in a cluster to be boosted by 2, 4 or 8× in frequency by ramping them to a higher voltage while disabling remaining cores in the cluster to offset the higher power consumption. Disabling the non-boosted cores opens up more of the cache to the boosted core, providing it with additional memory performance. In this way, Centip3De can be configured to maximize single-threaded performance, throughput, or a mixture of both, depending on workload. The fabricated Centip3De system consists of two stacked dies with 64 ARM M3 near-threshold cores that make up 16 four-core clusters, each connected to a 4-way 1kB instruction cache and a 4-way 8kB data cache. The caches communicate over a 3D bus that connects them to DRAM controllers that form the backing store for the caches. Centip3De is designed to be expandable to 4 layers of cores/caches with 2-3 layers of stacked DRAM. This paper provides results for a two-layer system (referred to as the fabricated system), but for completeness we also describe the complete design that will consist of 128 cores, 4-layer logic + 3-layer DRAM (referred to as the expanded system). Final assembly of the expanded system is anticipated at a later date. Figure 10.7.1 shows the floorplan for a cluster, which separates caches and cores into adjacent layers. The four cores communicate with their adjacent cache through a face-to-face (F2F) 3D interface, which reduces routing resource requirements by providing 331 interface connections in the middle of each core. The F2F interconnects have a pitch of 5μm and a loading ...
The power target for exascale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeoffs between error correction cost, access energy and refresh power-reducing the physical page size to decrease access energy increases the energy/area overhead of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.