2022
DOI: 10.1007/978-3-031-23220-6_14
|View full text |Cite
|
Sign up to set email alerts
|

On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems

Abstract: Recent High-Performance Computing (HPC) systems are suffering from severe problems, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this obser… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Many centers focus on developing such monitoring infrastructures, often with the goal of system tracking and tuning. A prominent example, currently deployed on production systems at LRZ and extended in the projects DEEP-SEA [37] and RE-GALE [38], is the Data Center Data Base (DCDB) [39], [40], which is capable of routinely tracking millions of sensors on large scale production systems, such as SuperMUC-NG, using technologies from the IoT space combined with a federation of time series databases built in top of Cassandra. Similarly, the ADMIRE project [41] is building an entirely new measurement infrastructure relying on the Prometheus time-series database (TSDB) connected to a node-level aggregating push gateway coupled with LIMITLESS [42] for node-level monitoring and high-speed spatial reduction based on a tree-based overlay network (TBON).…”
Section: B Monitoring and Modelingmentioning
confidence: 99%
“…Many centers focus on developing such monitoring infrastructures, often with the goal of system tracking and tuning. A prominent example, currently deployed on production systems at LRZ and extended in the projects DEEP-SEA [37] and RE-GALE [38], is the Data Center Data Base (DCDB) [39], [40], which is capable of routinely tracking millions of sensors on large scale production systems, such as SuperMUC-NG, using technologies from the IoT space combined with a federation of time series databases built in top of Cassandra. Similarly, the ADMIRE project [41] is building an entirely new measurement infrastructure relying on the Prometheus time-series database (TSDB) connected to a node-level aggregating push gateway coupled with LIMITLESS [42] for node-level monitoring and high-speed spatial reduction based on a tree-based overlay network (TBON).…”
Section: B Monitoring and Modelingmentioning
confidence: 99%