2009
DOI: 10.1145/2492101.1555369
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors

Abstract: Temperature-induced reliability issues are among the major challenges for multicore architectures. Thermal hot spots and thermal cycles combine to degrade reliability. This research presents new reliability-aware job scheduling and power management approaches for chip multiprocessors. Accurate evaluation of these policies requires a novel simulation framework that can capture architecture-level effects over tens of seconds or longer, while also capturing thermal interactions among cores resulting from dynamic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
1

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 37 publications
0
18
0
1
Order By: Relevance
“…In this case, we have η = ln(1 + N Dmax ǫ ) andz i (t) = s i (t) in (12). Letλ t andl i,t denote the dual variables associated with the constraints s i (t) ≥ D(t) ∀t and s i (t) ≥ 0 ∀i, t in (12), respectively. Applying the KKT conditions to the dual problem of (12), we have:…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this case, we have η = ln(1 + N Dmax ǫ ) andz i (t) = s i (t) in (12). Letλ t andl i,t denote the dual variables associated with the constraints s i (t) ≥ D(t) ∀t and s i (t) ≥ 0 ∀i, t in (12), respectively. Applying the KKT conditions to the dual problem of (12), we have:…”
Section: Discussionmentioning
confidence: 99%
“…We further extend the model to the situation where each data center has access to local renewable energy, or has a certain protection mechanism and delay tolerance. For instance, protection mechanisms can reduce the wear-and-tear cost and the corresponding risk involved in the state toggling of servers [12], while delay tolerant workload is less sensitive to the latency for toggling servers out of power-saving mode. On the other hand, timing-varying renewable energy supply can help reduce both the operational and switching costs.…”
Section: Related Workmentioning
confidence: 99%
“…The first Dynamic Reliability Management (DRM) approach [72] focusing on a single general purpose processor was proposed in 2004. After that, following also the architectural progresses in the subsequent years, different types of platforms have been considered spanning from the classical homogeneous multi-core architecture [74] [75] [76], where processing units are connected on a single bus and with a shared memory, to the NoC-based many-core architecture [77] [78] [79]. Recently, heterogeneous architectures [80] [81] [82], integrating asymmetric processors, GPUs or custom accelerators, have been also addressed in lifetime management.…”
Section: Reliabilitymentioning
confidence: 99%
“…Recently, heterogeneous architectures [80] [81] [82], integrating asymmetric processors, GPUs or custom accelerators, have been also addressed in lifetime management. Depending on the specific architecture, the resource management approaches act on application mapping (as in the case of many-cores architectures [78] [83]), scheduling (as in the case of sharedmemory systems [74]), and/or on power-related knobs (DVFS and per-core power gating [72] [74] [75]). Another relevant aspect is that lifetime is only one of considered parameters, thus leading in most of the approaches to a co-optimization with performance or power/energy-consumption.…”
Section: Reliabilitymentioning
confidence: 99%
“…For example, the authors of [24] show the effect of thermal cycles on lifetime reliability, but do not put forward an online approach to voltage scaling taking into account the thermal cycles.…”
Section: Hybrid Dvfs Schemesmentioning
confidence: 99%