In deep submicron circuits, elevation in temperatures has brought new challenges in reliability, timing, performance, cooling costs and leakage power. Conventional thermal management techniques sacrifice performance to control the thermal behavior by slowing down or turning off the processors when a critical temperature threshold is exceeded. Moreover, studies have shown that in addition to high temperatures, temporal and spatial variations in temperature impact system reliability. In this work, we explore the benefits of thermally aware task scheduling for multiprocessor systems-on-a-chip (MPSoC). We design and evaluate OS-level dynamic scheduling policies with negligible performance overhead. We show that, using simple to implement policies that make decisions based on temperature measurements, better temporal and spatial thermal profiles can be achieved in comparison to stateof-art schedulers. We also enhance reactive strategies such as dynamic thread migration with our scheduling policies. This way, hot spots and temperature variations are decreased, and the performance cost is significantly reduced.
Abstract-Liquid cooling has emerged as a promising solution for addressing the elevated temperatures in 3D stacked architectures. In this work, we first propose a framework for detailed thermal modeling of the microchannels embedded between the tiers of the 3D system. In multicore systems, workload varies at runtime, and the system is generally not fully utilized. Thus, it is not energy-efficient to adjust the coolant flow rate based on the worst-case conditions, as this would cause an excess in pump power. For energy-efficient cooling, we propose a novel controller to adjust the liquid flow rate to meet the desired temperature and to minimize pump energy consumption. Our technique also includes a job scheduler, which balances the temperature across the system to maximize cooling efficiency and to improve reliability. Our method guarantees operating below the target temperature while reducing the cooling energy by up to 30%, and the overall energy by up to 12% in comparison to using the highest coolant flow rate.
Abstract-Designing thermal management strategies that reduce the impact of hot spots and on-die temperature variations at low performance cost is a very significant challenge for multiprocessor system-on-chips (MPSoCs). In this work, we present a proactive MPSoC thermal management approach, which predicts the future temperature and adjusts the job allocation on the MPSoC to minimize the impact of thermal hot spots and temperature variations without degrading performance. In addition, we implement and compare several reactive and proactive management strategies, and demonstrate that our proactive temperatureaware MPSoC job allocation technique is able to dramatically reduce the adverse effects of temperature at very low performance cost. We show experimental results using a simulator as well as an implementation on an UltraSPARC T1 system.
Abstract-Thermal hot spots and temperature gradients on the die need to be minimized to manufacture reliable systems while meeting energy and performance constraints. In this work, we solve the task scheduling problem for multiprocessor system-on-chips (MPSoCs) using Integer Linear Programming (ILP). The goal of our optimization is minimizing the hot spots and balancing the temperature distribution on the die for a known set of tasks. Under the given assumptions about task characteristics, the solution is optimal. We compare our technique against optimal scheduling methods for energy minimization, energy balancing, and hot spot minimization, and show that our technique achieves significantly better thermal profiles. We also extend our technique to handle workload variations at runtime.
Many companies deploy multiple data centers across the globe to satisfy the dramatically increased computational demand. Wide area connectivity between such geographically distributed data centers has an important role to ensure both the quality of service, and, as bandwidths increase to 100Gbps and beyond, as an efficient way to dynamically distribute the computation. The energy cost of data transmission is dominated by the router power consumption, which is unfortunately not energy proportional. In this paper we not only quantify the performance benefits of leveraging the network to run more jobs, but also analyze its energy impact. We compare the benefits of redesigning routers to be more energy efficient to those obtained by leveraging locally available green energy as a complement to the brown energy supply. Furthermore, we design novel green energy aware routing policies for wide area traffic and compare to state-of-the-art shortest path routing algorithm. Our results indicate that using energy proportional routers powered in part by green energy along with our new routing algorithm results in 10x improvement in per router energy efficiency with 36% average increase in the number of jobs completed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.