Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers 2015
DOI: 10.1145/2768405.2768413
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying Scheduling Challenges for Exascale System Software

Abstract: The move towards high-performance computing (HPC) applications comprised of coupled codes and the need to dramatically reduce data movement is leading to a reexamination of time-sharing vs. space-sharing in HPC systems. In this paper, we discuss and begin to quantify the performance impact of a move away from strict space-sharing of nodes for HPC applications. Specifically, we examine the potential performance cost of time-sharing nodes between application components, we determine whether a simple coordinated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 27 publications
(21 reference statements)
0
3
0
Order By: Relevance
“…The authors of [32], describe some of the disadvantages of applying quadratic programming to the problem of mapping virtual machines to NUMA domains. They show that for a problem formulated as a nonconvex quadratic, the solutions are suboptimal, and that reducing the complexity by dividing the mapping problem into subproblems can improve the algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…The authors of [32], describe some of the disadvantages of applying quadratic programming to the problem of mapping virtual machines to NUMA domains. They show that for a problem formulated as a nonconvex quadratic, the solutions are suboptimal, and that reducing the complexity by dividing the mapping problem into subproblems can improve the algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…However, as node counts increase, the likelihood of one task experiencing an interruption becomes a key factor to performance; as the BSP per‐loop iteration time decreases, the effects of even smaller noise perturbations increase, further emphasizing the usefulness of tightly synchronized clocks. () To mitigate the potentially disastrous hit on performance, sophisticated runtime systems and operating systems seek to overlap any disruption in progress across all compute nodes. When system software is able to schedule all interrupting tasks on every node such that they occur simultaneously, the cascading effect on the parallel algorithm is kept in check—a strategy known as coordinated scheduling .…”
Section: Importance Of Time Agreementmentioning
confidence: 99%
“…The scheduler must guarantee precise CPU reservations for cooperative codes and accurate timing to run applications with coordination needs across nodes. () In this scenario, a precise inter‐node time agreement is critical. Moreover, the absence of suitable time agreement prevents effective gang‐scheduling and coordinated‐scheduling.…”
Section: Importance Of Time Agreementmentioning
confidence: 99%