Application and Thermal-reliability-aware Reinforcement Learning Based Multi-core Power Management

Dinakarrao, Sai Manoj Pudukotai; Joseph, Arun; Haridass, A.; Shafique, Muhammad; Henkel, Jörg; Homayoun, Houman

doi:10.1145/3323055

Cited by 21 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the RL technique has been reported to be lightweight and highly suitable for the systems, compared to other types of learning techniques [28], the main issues are its convergence and timing overhead. Accordingly, similar to other studies [31], we have reduced the feasible actions to reduce the complexity and convergence issues. In the following, we investigate the timing and memory overheads of the employed learning technique.…”

Section: Investigating the Timing And Memory Overheads Of ML Techniquementioning

confidence: 88%

“…The Q-learning/SARSA technique, which is recently been used in many emerging applications, such as robotics, and Unmanned Aerial Vehicles (UAV) [29,30], uses the RL technique to perform the runtime management/optimization of the system properties in single or multi-core processors. The general Q-learning/SARSA technique consists of the three main components [31,32], including: (1) a discrete set of states S = {s 1 , s 2 , ..., s l }, (2) a discrete set of actions A = {a 1 , a 2 , ..., a k }, and (3) reward function R. The states and actions determine the rows and columns of the Q-table of the learning-based algorithm, respectively (shown in Figure 2). The algorithm collects the current state s t , and determines the next action a t (a t ∈ A).…”

Section: Learning-based System Properties Optimizationmentioning

confidence: 99%

“…A value-based algorithm is represented with Q(s t , a t ) for each state-action pair in the Qtable . The Q-values are updated based on the corresponding computed reward in every iteration. The Q-values are calculated according to Equation (4) [31][32][33], which is based on the SARSA learning algorithm [34,35]. The algorithm learns the optimal action in every state, and this process is repeated until a predefined convergence criterion is met.…”

Section: Learning-based System Properties Optimizationmentioning

confidence: 99%

See 2 more Smart Citations

Learning-Oriented QoS- and Drop-Aware Task Scheduling for Mixed-Criticality Systems

et al. 2022

View full text Add to dashboard Cite

In Mixed-Criticality (MC) systems, multiple functions with different levels of criticality are integrated into a common platform in order to meet the intended space, cost, and timing requirements in all criticality levels. To guarantee the correct, and on-time execution of higher criticality tasks in emergency modes, various design-time scheduling policies have been recently presented. These techniques are mostly pessimistic, as the occurrence of worst-case scenario at run-time is a rare event. Nevertheless, they lead to an under-utilized system due to frequent drops of Low-Criticality (LC) tasks, and creation of unused slack times due to the quick execution of high-criticality tasks. Accordingly, this paper proposes a novel optimistic scheme, that introduces a learning-based drop-aware task scheduling mechanism, which carefully monitors the alterations in the behaviour of the MC system at run-time, to exploit the generated dynamic slacks for reducing the LC tasks penalty and preventing frequent drops of LC tasks in the future. Based on an extensive set of experiments, our observations have shown that the proposed approach exploits accumulated dynamic slack generated at run-time, by 9.84% more on average compared to existing works, and is able to reduce the deadline miss rate by up to 51.78%, and 33.27% on average, compared to state-of-the-art works.

show abstract

Section: Investigating the Timing And Memory Overheads Of ML Techniquementioning

confidence: 88%

Section: Learning-based System Properties Optimizationmentioning

confidence: 99%

Section: Learning-based System Properties Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

Learning-Oriented QoS- and Drop-Aware Task Scheduling for Mixed-Criticality Systems

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Several works have employed RL for power/thermal optimization [31]. The works in [18], [20], [21] use RL for power management via DVFS. However, they neither consider temperature nor QoS.…”

Section: Related Workmentioning

confidence: 99%

NPU-Accelerated Imitation Learning for Thermal Optimization of QoS-Constrained Heterogeneous Multi-Cores

Rapp¹,

Khdr²,

Krohmer³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Application migration and dynamic voltage and frequency scaling (DVFS) are indispensable means for fully exploiting the available potential in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem because 1) the diverse characteristics and QoS targets of applications require different optimizations, and 2) per-cluster DVFS requires a global optimization considering all running applications. State-of-the-art resource management techniques for power or temperature minimization either rely on measurements that are often not available (such as power) or fail to consider all the dimensions of the problem (e.g., by using simplified analytical models). Imitation learning (IL) enables to use the optimality of an oracle policy, yet at low run-time overhead, by training a model from oracle demonstrations. We are the first to employ IL for temperature minimization under QoS targets. We tackle the complexity by training a neural network (NN) and accelerate the NN inference using a neural processing unit (NPU). While such NN accelerators are becoming increasingly widespread on end devices, they are so far only used to accelerate user applications. In contrast, we use an existing accelerator on a real platform to accelerate NN-based resource management. Our evaluation on a HiKey 970 board with an Arm big.LITTLE CPU and an NPU shows significant temperature reductions at a negligible run-time overhead, with unseen applications and different cooling than used for training.

show abstract

“…As a result, most DVFS techniques [32,36,40,42,48] relying on a predefined temperature prediction model would not work properly for mobile devices. Furthermore, recent supervised learning-based approaches [8,11,45,53] only give the adaptation ability to previously trained environments. Thus, their thermal management in mobile settings has no performance guarantee.…”

Section: Introductionmentioning

confidence: 99%

zTT

Kim

Bin

et al. 2021

Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services

View full text Add to dashboard Cite

DVFS (dynamic voltage and frequency scaling) is a system-level technique that adjusts voltage and frequency levels of CPU/GPU at runtime to balance energy efficiency and high performance. DVFS has been studied for many years, but it is considered still challenging to realize a DVFS that performs ideally for mobile devices for two main reasons: i) an optimal power budget distribution between CPU and GPU in a power-constrained platform can only be defined by the application performance, but conventional DVFS implementations are mostly application-agnostic; ii) mobile platforms experience dynamic thermal environments for many reasons such as mobility and holding methods, but conventional implementations are not adaptive enough to such environmental changes. In this work, we propose a deep reinforcement learning-based frequency scaling technique, zTT. zTT learns thermal environmental characteristics and jointly scales CPU and GPU frequencies to maximize the application performance in an energy-efficient manner while achieving zero thermal throttling. Our evaluations for zTT implemented on Google Pixel 3a and NVIDIA JETSON TX2 platform with various applications show that zTT can adapt quickly to changing thermal environments, consistently resulting in high application performance with energy efficiency. In a high-temperature environment where a rendering application with the default mobile DVFS fails to keep producing more than a target frame rate, zTT successfully manages to do so even with 23.9% less average power consumption. CCS CONCEPTS• Human-centered computing → Ubiquitous and mobile computing systems and tools; • Software and its engineering → Power management.

show abstract

Application and Thermal-reliability-aware Reinforcement Learning Based Multi-core Power Management

Cited by 21 publications

References 26 publications

Learning-Oriented QoS- and Drop-Aware Task Scheduling for Mixed-Criticality Systems

Learning-Oriented QoS- and Drop-Aware Task Scheduling for Mixed-Criticality Systems

NPU-Accelerated Imitation Learning for Thermal Optimization of QoS-Constrained Heterogeneous Multi-Cores

zTT

Contact Info

Product

Resources

About