2013
DOI: 10.1049/iet-cdt.2012.0059
|View full text |Cite
|
Sign up to set email alerts
|

On the trade‐off of mixing scientific applications on capacity high‐performance computing systems

Abstract: Network contention is seen as a major hurdle to achieve higher throughput in today's large-scale high-performance computing systems. Even more so with the current trend of employing blocking networks driven by the need of reducing cost. Additionally, the effect is aggravated by current system schedulers that allocate jobs as soon as nodes become available, thus producing job fragmentation, that is, the tasks of one job might be spread throughout the system instead of being allocated contiguously. This fragment… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
5
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 17 publications
(17 reference statements)
0
5
0
Order By: Relevance
“…Several studies are dedicated to address workload interference at various system levels. Existing approaches include eliminating job interaction by separating large-sized and small-sized jobs into different system locations [9], providing contention free routes for MPI collectives [28], and decreasing interference through job placement [16] [10] or network routing [8] on fat-tree network. Workload interference is a very complicated problem, which could be caused by various configurations across the system stack.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Several studies are dedicated to address workload interference at various system levels. Existing approaches include eliminating job interaction by separating large-sized and small-sized jobs into different system locations [9], providing contention free routes for MPI collectives [28], and decreasing interference through job placement [16] [10] or network routing [8] on fat-tree network. Workload interference is a very complicated problem, which could be caused by various configurations across the system stack.…”
Section: Introductionmentioning
confidence: 99%
“…In this work, we present an in-depth analysis of several representative applications on a pruned fat-tree system. Unlike the existing studies using socket-based co-simulation [8,10], our study is based on discrete event-driven, packet-level simulation using the Codesign of Exascale Storage System (CODES) that provides higher fidelity simulation [4,15,22,23,25,26]. Given that workload interference only impacts communication time, in this study we focus on communication cost.…”
Section: Introductionmentioning
confidence: 99%
“…However, as jobs with different resource requirements (mainly: number of nodes and execution time) arrive and leave the system in an non-deterministic fashion, allocating them to maximize system utilization will lead to a fragmentation of the resources assigned to the jobs: e.g., a job requiring a large node-count will be placed in nodes left free by previously running jobs requiring less nodes that have already finished. This problem becomes even more important in multi-stage networks, such as fat-trees (a common network topology in both the commercial and HPC domains), where fragmentation becomes more relevant the more spread out a job is on the system, as every stage increases the communication latency (up to a factor of 1.5 in state-of-the-art three-level multi-stage supercomputers 60 ), and it also increases the probability of harmful interference leading to significant application performance degradation 30 .…”
mentioning
confidence: 99%
“…However, although it is practically impossible to accurately measure the impact of interference directly in a production system, several authors have managed to measure it indirectly, by looking at the performance variability across several executions of the same job, arriving to the conclusion that interference is a main contributor to performance loss 31,8,30 .…”
mentioning
confidence: 99%
See 1 more Smart Citation