2009
DOI: 10.1002/cpe.1553
|View full text |Cite
|
Sign up to set email alerts
|

HPCTOOLKIT: tools for performance analysis of optimized parallel programs

Abstract: SUMMARYHPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
290
0
4

Year Published

2009
2009
2016
2016

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 413 publications
(295 citation statements)
references
References 40 publications
1
290
0
4
Order By: Relevance
“…Burtscher et al [10] designed Perfexpert to automate identifying the performance bottlenecks of HPC applications with predefined rules. Adhianto et al [3] designed HPCToolkit to measure hardware events and to correlate the events with source code to identify performance bottlenecks of parallel applications. The detection mechanisms of these tools were heavily dependent on manually created metrics and rules.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Burtscher et al [10] designed Perfexpert to automate identifying the performance bottlenecks of HPC applications with predefined rules. Adhianto et al [3] designed HPCToolkit to measure hardware events and to correlate the events with source code to identify performance bottlenecks of parallel applications. The detection mechanisms of these tools were heavily dependent on manually created metrics and rules.…”
Section: Related Workmentioning
confidence: 99%
“…1.5(a, c, e) show the average execution time of checkpoints 25, 31 and 36 for the number of saved objects, and the linear relation between the average execution time and the number of saved objects. 3 It shows the performance bottleneck in these checkpoints when computed with the large number of stored objects. This is because the large number of saved objects requires more comparisons and computation.…”
Section: Analysis Of Saved Objectsmentioning
confidence: 99%
“…Based on this assessment, the compiler skips the instrumentation of those functions that are either short or called within nested loops. However, generally not instrumenting small functions was criticized by Adhianto et al [1]. They argue that small functions often play a significant role, for example, if they include synchronization calls important to parallel performance.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, some higher level analysis tools gather additional information by combining the HPM counts with application level traces. Popular representatives of that analysis method are HPCToolkit [1], PerfSuite [10], Open|Speedshop [16] or Scalasca [3]. The intention of these tools is to advise the application developer with educated optimization hints.…”
Section: Introduction and Related Workmentioning
confidence: 99%