Abstract:To secure correct system operation, a plethora of Reliability, Availability and Serviceability (RAS) techniques have been deployed by circuit designers. RAS mechanisms however, come with the cost of extra clock cycles. In addition, a wide variety of dynamic workloads and different input conditions often constitute preemptive dependability techniques hard to implement. To this end, we focus on a realistic case study of a closed-loop controller that mitigates performance variation with a reactive response. This … Show more
“…This requires a control process overhead which is apt to diminish or cancel the efficiency gains of potentially very high processing speeds. New methods are being proposed for ultra-scaled digital microchips (Noltsis, Zambelis, Catthoor, & Soudris, 2019) to remove that overhead and make fuller use of the fast processing available at the physical level while retaining timing guarantees. The timing challenges that are connected with deeply scaled digital microchips have some surprising connections with challenges in non-digital computing.…”
“…This requires a control process overhead which is apt to diminish or cancel the efficiency gains of potentially very high processing speeds. New methods are being proposed for ultra-scaled digital microchips (Noltsis, Zambelis, Catthoor, & Soudris, 2019) to remove that overhead and make fuller use of the fast processing available at the physical level while retaining timing guarantees. The timing challenges that are connected with deeply scaled digital microchips have some surprising connections with challenges in non-digital computing.…”
3D-stacked processor-memory systems stack memory (DRAM banks) directly on top of logic (CPU cores) using chiplet-on-chiplet packaging technology to provide the next-level computing performance in embedded platforms. Stacking, however, severely increases the system’s power density without any accompanying increase in the heat dissipation capacity. Consequently, 3D-stacked processor-memory systems suffer more severe thermal issues than their non-stacked counterparts. Nevertheless, 3D-stacked processor-memory systems do inherit power (thermal) management knobs from their non-stacked predecessors - namely Dynamic Voltage and Frequency Scaling (DVFS) for cores and Low Power Mode (LPM) for memory banks. In the context of 3D-stacked processor-memory systems, DVFS and LPM are performance- and power-wise deeply intertwined. Their non-unified independent use on 3D-stacked processor-memory systems results in sub-optimal thermal management. The unified use of DVFS and LPM for thermal management for 3D-stacked processor-memory systems remains unexplored. The lack of implementation of LPM in thermal simulators for 3D-stacked processor-memory systems hinders real-world representative evaluation for a unified approach.
We extend the state-of-the-art interval thermal simulator for 3D-stacked processor-memory systems
CoMeT
with an LPM power management knob for memory banks. We also propose a learning-based thermal management technique for 3D-stacked processor-memory systems that employ DVFS and LPM in a unified manner. Detailed interval thermal simulations with the extended
CoMeT
framework show a 10.15% average response time improvement with the
PARSEC
and
SPLASH-2
benchmark suites, along with widely-used Deep Neural Network (DNN) workloads against a state-of-the-art thermal management technique for 2.5D processor-memory systems (ported directly to 3D-stacked processor-memory systems) that also proposes unified use of DVFS and LPM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.