Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Böhme, David; Geimer, Markus; Wolf, Felix; Arnold, Lukas

doi:10.1109/icpp.2010.18

Cited by 40 publications

(21 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Key execution performance characteristics of Sweep3D were revealed by Scalasca runtime summarization and automated event trace analyses, and refined employing source code annotations inserted for major iteration loops and code sections to direct instrumentation and analysis. In on-going research we are investigating automatic determination and combining of iterations with similar performance profiles [17], and analyzing traces for the root causes of wait states to improve attribution of performance problems [18]. Tools for measuring and analyzing application execution performance also need to be highly scalable themself [19], as demonstrated by the Scalasca toolset with several hundred thousand Sweep3D processes on Cray XT5 and IBM BG/P, where multiple techniques for effective data reduction and management are employed and application-oriented graphical presentation facilitated insight into load-balance problems that only become critical at larger scales.…”

Section: Resultsmentioning

confidence: 99%

Large-Scale Performance Analysis of Sweep3d With the Scalasca Toolset

Wylie

Geimer

Mohr

et al. 2010

Parallel Process. Lett.

Self Cite

View full text Add to dashboard Cite

Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer systems relying on applications being able to exploit very large configurations of processor cores, and associated analysis tools must also scale commensurately to isolate and quantify performance issues that manifest at the largest scales. In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on XT5 and BG/P systems, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated with computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, in a portable and straightforward to use toolset, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes.

show abstract

Section: Resultsmentioning

confidence: 99%

Large-Scale Performance Analysis of Sweep3d With the Scalasca Toolset

Wylie

Geimer

Mohr

et al. 2010

Parallel Process. Lett.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Most of these works are focused on MPI; For example, performance models can help finding weak scaling issues [6]. A backward replay of an execution trace can be used for identifying the root cause of wait-states in MPI applications [5].…”

Section: Related Workmentioning

confidence: 99%

ScalOMP: Analyzing the Scalability of OpenMP Applications

Daumen

Carribault

Thomas

2019

OpenMP: Conquering the Full Hardware Spectrum

View full text Add to dashboard Cite

Achieving good scalability from parallel codes is becoming increasingly difficult due to the hardware becoming more and more complex. Performance tools help developers but their use is sometimes complicated and very iterative. In this paper we propose a simple methodology for assessing the scalability and for detecting performance problems in an OpenMP application. This methodology is implemented in a performance analysis tool named ScalOMP that relies on the capabilities of OMPT for analyzing OpenMP applications. ScalOMP reports the code regions with scalability issues and suggests optimization strategies for those issues. The evaluation shows that ScalOMP incurs low overhead and that its suggestions lead to significant performance improvement of several OpenMP applications.

show abstract

“…In this context the classical monitoring approaches are combined with elements from complex event processing [8][9][10] in which the progress of a workflow is captured as an ordered set of events, the so-called trace or lifeline [11][12][13]. Applications comprise the identification of wait states in large scale simulations [14], the analysis of for-loops [15], or the visualization of execution and wait times in distributed systems [16].…”

Section: Introductionmentioning

confidence: 99%

Performance analysis of concurrent workflows

Kempa-Liehr

2015

Journal of Big Data

View full text Add to dashboard Cite

Automated workflows are the key concept of big data pipelines in science, engineering and enterprise applications. The performance analysis of automated workflows is an important topic of the continuous improvement process and the foundation of designing new workflows. This paper introduces the concept of process evolution functions and event reduction policies, which allow for the time resolved visualization of an unlimited number of concurrent workflows by means of aggregated task views. The visualization allows for an intuitive approach to the performance analysis of concurrent workflows. The theoretical foundation of this approach is applicable for workflows represented by directed acyclic graphs. It is explained on the basis of a simple IO-workflow model, which is typically found for distributed resource management systems utilized for many-task computing.

show abstract

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Cited by 40 publications

References 20 publications

Large-Scale Performance Analysis of Sweep3d With the Scalasca Toolset

Large-Scale Performance Analysis of Sweep3d With the Scalasca Toolset

ScalOMP: Analyzing the Scalability of OpenMP Applications

Performance analysis of concurrent workflows

Contact Info

Product

Resources

About