Proceedings of the 51st Annual Design Automation Conference 2014
DOI: 10.1145/2593069.2593199
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning-Based Inter- and Intra-Application Thermal Optimization for Lifetime Improvement of Multicore Systems

Abstract: The thermal profile of multicore systems vary both within an application's execution (intra) and also when the system switches from one application to another (inter). In this paper, we propose an adaptive thermal management approach to improve the lifetime reliability of multicore systems by considering both inter-and intra-application thermal variations. Fundamental to this approach is a reinforcement learning algorithm, which learns the relationship between the mapping of threads to cores, the frequency of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
76
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 78 publications
(77 citation statements)
references
References 17 publications
1
76
0
Order By: Relevance
“…In [27], a design methodology that minimizes energy consumption of and temperature-induced wear on multiprocessor systems is introduced; yet neither energy nor temperature is modeled with an awareness of uncertainty due to process variation. A similar observation can be made with respect to the work reported in [28] where a reinforcementlearning algorithm is used in order to improve the lifetime of multiprocessor systems. An extensive survey of reliability-aware system-level design techniques given in [26] confirms the trend emphasized above: the widespread device-level models of failure mechanisms generally ignore the impact of process variation on temperature.…”
Section: Previous Worksupporting
confidence: 70%
See 1 more Smart Citation
“…In [27], a design methodology that minimizes energy consumption of and temperature-induced wear on multiprocessor systems is introduced; yet neither energy nor temperature is modeled with an awareness of uncertainty due to process variation. A similar observation can be made with respect to the work reported in [28] where a reinforcementlearning algorithm is used in order to improve the lifetime of multiprocessor systems. An extensive survey of reliability-aware system-level design techniques given in [26] confirms the trend emphasized above: the widespread device-level models of failure mechanisms generally ignore the impact of process variation on temperature.…”
Section: Previous Worksupporting
confidence: 70%
“…Similarly, workload uncertainty has not been deprived of attention; see, for instance, [32,88,96,98,105,124]. Aging uncertainty has also been studied extensively in the literature; see, for instance, [24,28,39,50,61,83]. However, certain important problems have not been addressed yet, and in the case of the ones that have been considered, the proposed solutions are often restricted in use, which is due in part to the unrealistic assumptions that these solutions make.…”
Section: Previous Workmentioning
confidence: 99%
“…However, as shown in [Faruque et al 2010], these approaches cannot guarantee to minimize a system's thermal overhead effectively for all applications. A cross-layer thermal optimization technique is proposed in [Das et al 2014] to manage temperature-related emergencies. Although these studies have shown improvement in thermal profile leading to extended lifetime reliability using scaled voltage and frequency, thermal cycling and energy consumption are not jointly addressed.…”
Section: Related Workmentioning
confidence: 99%
“…As shown in [Das et al 2014], temperature of an embedded system can be controlled significantly by controlling the processor power states (i.e., their voltage and frequency) and the application thread allocation (that limits context switching). However, the amount of thermal control achieved using these control levers is dependent on the application, its cross-layer interaction with the system software and the hardware, and also on the working environment.…”
Section: Motivation For Machine Learningmentioning
confidence: 99%
See 1 more Smart Citation