2010 39th International Conference on Parallel Processing 2010
DOI: 10.1109/icpp.2010.18
|View full text |Cite
|
Sign up to set email alerts
|

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Abstract: Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(21 citation statements)
references
References 20 publications
0
21
0
Order By: Relevance
“…Key execution performance characteristics of Sweep3D were revealed by Scalasca runtime summarization and automated event trace analyses, and refined employing source code annotations inserted for major iteration loops and code sections to direct instrumentation and analysis. In on-going research we are investigating automatic determination and combining of iterations with similar performance profiles [17], and analyzing traces for the root causes of wait states to improve attribution of performance problems [18]. Tools for measuring and analyzing application execution performance also need to be highly scalable themself [19], as demonstrated by the Scalasca toolset with several hundred thousand Sweep3D processes on Cray XT5 and IBM BG/P, where multiple techniques for effective data reduction and management are employed and application-oriented graphical presentation facilitated insight into load-balance problems that only become critical at larger scales.…”
Section: Resultsmentioning
confidence: 99%
“…Key execution performance characteristics of Sweep3D were revealed by Scalasca runtime summarization and automated event trace analyses, and refined employing source code annotations inserted for major iteration loops and code sections to direct instrumentation and analysis. In on-going research we are investigating automatic determination and combining of iterations with similar performance profiles [17], and analyzing traces for the root causes of wait states to improve attribution of performance problems [18]. Tools for measuring and analyzing application execution performance also need to be highly scalable themself [19], as demonstrated by the Scalasca toolset with several hundred thousand Sweep3D processes on Cray XT5 and IBM BG/P, where multiple techniques for effective data reduction and management are employed and application-oriented graphical presentation facilitated insight into load-balance problems that only become critical at larger scales.…”
Section: Resultsmentioning
confidence: 99%
“…Most of these works are focused on MPI; For example, performance models can help finding weak scaling issues [6]. A backward replay of an execution trace can be used for identifying the root cause of wait-states in MPI applications [5].…”
Section: Related Workmentioning
confidence: 99%
“…In this context the classical monitoring approaches are combined with elements from complex event processing [8][9][10] in which the progress of a workflow is captured as an ordered set of events, the so-called trace or lifeline [11][12][13]. Applications comprise the identification of wait states in large scale simulations [14], the analysis of for-loops [15], or the visualization of execution and wait times in distributed systems [16].…”
Section: Introductionmentioning
confidence: 99%