ACM/IEEE SC 2000 Conference (SC'00) 2000
DOI: 10.1109/sc.2000.10034
|View full text |Cite
|
Sign up to set email alerts
|

Using Hardware Performance Monitors to Isolate Memory Bottlenecks

Abstract: In

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2002
2002
2008
2008

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(18 citation statements)
references
References 11 publications
0
18
0
Order By: Relevance
“…Buck et al use the Itanium-2 data tracing PMU support to associate load misses to source code lines and data structures in uniprocessor programs [Buck and Hollingsworth 2004]. Buck et al also compare different hardware mechanism for detecting uniprocessor memory hierarchy bottlenecks [Buck and Hollingsworth 2000b]. Satoh et al study data-flow techniques to analyze data sharing patterns at compile time for OpenMP programs [Sato et al 1999].…”
Section: Related Workmentioning
confidence: 99%
“…Buck et al use the Itanium-2 data tracing PMU support to associate load misses to source code lines and data structures in uniprocessor programs [Buck and Hollingsworth 2004]. Buck et al also compare different hardware mechanism for detecting uniprocessor memory hierarchy bottlenecks [Buck and Hollingsworth 2000b]. Satoh et al study data-flow techniques to analyze data sharing patterns at compile time for OpenMP programs [Sato et al 1999].…”
Section: Related Workmentioning
confidence: 99%
“…Buck et al use the Itanium-2 data tracing PMU support to associate load misses to source code lines and data structures in uniprocessor programs [7]. Buck et al also compare different hardware mechanism for detecting uniprocessor memory hierarchy bottlenecks [6]. Satoh et al study dataflow techniques to analyze data sharing patterns at compile time for OpenMP programs [28].…”
Section: Related Workmentioning
confidence: 99%
“…These facilities could enable presentation of data about cache behavior in terms of program data structures at the source code level. Work reported in [3] has shown that such information can be extremely useful in identifying performance bottlenecks caused by bad cache behavior. In [3], the data were obtained through use of a cache simulator which runs considerably slower than the original application (e.g., by a couple of orders of magnitude) and does not model details such as pipelining and multiple instruction issue.…”
Section: Implications For Papimentioning
confidence: 99%
“…Work reported in [3] has shown that such information can be extremely useful in identifying performance bottlenecks caused by bad cache behavior. In [3], the data were obtained through use of a cache simulator which runs considerably slower than the original application (e.g., by a couple of orders of magnitude) and does not model details such as pipelining and multiple instruction issue. Through use of appropriate hardware support (e.g., as on the Itanium), similar data could be obtained more accurately and efficiently.…”
Section: Implications For Papimentioning
confidence: 99%