Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538)
DOI: 10.1109/wwc.2001.990754
|View full text |Cite
|
Sign up to set email alerts
|

Modeling application performance by convolving machine signatures with application profiles

Abstract: This paper presents a performance modeling methodology that is faster than traditional cycle-accurate simulation, more sophisticated than performance estimation based on system peak-performance metrics, and is shown to be effective on a class of High Performance Computing benchmarks. The method yields insight into the factors that affect performance on single-processor and parallel computers.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
55
0

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(57 citation statements)
references
References 20 publications
0
55
0
Order By: Relevance
“…This can be achieved by building a performance model that predicts the effectiveness of communication-reduction techniques under given platform properties and application characteristics. Such performance models have been constructed for other high-performance computing applications in the past, both on the application computation performance [30,31] and on the message passing performance [32,33]. However, it is challenging to build accurate performance models for irregular applications such as the parallel sparse LU factorization because their data structures and execution behaviors are hard to predict.…”
Section: Runtime Application Adaptationmentioning
confidence: 99%
“…This can be achieved by building a performance model that predicts the effectiveness of communication-reduction techniques under given platform properties and application characteristics. Such performance models have been constructed for other high-performance computing applications in the past, both on the application computation performance [30,31] and on the message passing performance [32,33]. However, it is challenging to build accurate performance models for irregular applications such as the parallel sparse LU factorization because their data structures and execution behaviors are hard to predict.…”
Section: Runtime Application Adaptationmentioning
confidence: 99%
“…Snavely et. al use profile convolving [1] a trace based method which involves the creation of a machine profile and an application profile. Machine profiles describe the behavior of loads and stores for the given processor, while the application profile is a runtime utility which captures and statistically records all memory references.…”
Section: Previous Workmentioning
confidence: 99%
“…We evaluate the performance model in which we use true hardware counters through PAPI [2] to predict the performance (henceforth called the PAPI model) and compare it to the model in which we use estimates of lower and upper bound of cache and TLB misses (henceforth termed the analytic lower and upper bound models). The cache and memory latencies were derived [15] from published processor manuals, curve fitting, and experimental work using the Saavedra-Barrera memory system microbenchmark [10] and MAPS benchmarks [11]. Due to space limitations we present a summary of the full data [8].…”
Section: Verification Of the Analytic Modelmentioning
confidence: 99%