2009 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing 2009
DOI: 10.1109/pdp.2009.50
|View full text |Cite
|
Sign up to set email alerts
|

Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications

Abstract: In message-passing applications, the temporal or spatial distance between cause and symptom of a performance problem constitutes a major difficulty in deriving helpful conclusions from performance data. So just knowing the locations of wait states in the program is often insufficient to understand the reason for their occurrence. We therefore present a method for verifying hypotheses on causal connections between temporally or spatially distant performance phenomena without altering the application itself. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 22 publications
(27 citation statements)
references
References 12 publications
0
27
0
Order By: Relevance
“…It would therefore be desirable if the effects of redistributing a given delay could be automatically predicted and the expected savings be determined without altering the application itself. Since the effects of such changes are hard to quantify analytically, we plan to combine our delay analysis with a framework developed earlier by the authors [10], [24] that can simulate these changes via a real-time replay of event traces after they have been modified to reflect the redistributed load. First results indicate both high accuracy and good scalability, further application studies are in progress.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It would therefore be desirable if the effects of redistributing a given delay could be automatically predicted and the expected savings be determined without altering the application itself. Since the effects of such changes are hard to quantify analytically, we plan to combine our delay analysis with a framework developed earlier by the authors [10], [24] that can simulate these changes via a real-time replay of event traces after they have been modified to reflect the redistributed load. First results indicate both high accuracy and good scalability, further application studies are in progress.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the statistical inference process may prove inaccurate for applications with highly time-dependent performance behavior [9]. Finally, the lack of traces precludes the later simulation of imbalance smoothing to narrow the space of potential optimizations, as proposed by Hermanns et al [10].…”
Section: Introductionmentioning
confidence: 99%
“…PSiNS uses direct execution for MPI calls. SILAS (SImulation of LArgeScale parallel applications) [11] is a parallel trace-based performance simulator for large scale target systems. SILAS focuses on the effects of fine-grain alterations of applicationlevel behavior with respect to the performance under an identical execution configuration.…”
Section: Related Workmentioning
confidence: 99%
“…This trace replay is usually much faster than direct execution, as the computation and communications are not actually executed but abstracted as trace events. A number of tools [2], [24], [40], [32], [46], [23], including SMPI, support the off-line approach.…”
Section: The Smpi Frameworkmentioning
confidence: 99%