Proceedings of the ACM/IEEE SC2004 Conference
DOI: 10.1109/sc.2004.21
|View full text |Cite
|
Sign up to set email alerts
|

Data Centric Cache Measurement on the Intel ltanium 2 Processor

Abstract: Processor speed continues to increase faster than the speed of access to main memory, making effective use of memory caches more important. Information about an application's interaction with the cache is therefore critical to performance tuning. To be most useful, tools that measure this information should relate it to the source code level data structures in an application. We describe how to gather such information by using hardware performance counters to sample cache miss addresses, and present a new tool… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(9 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…Buck and Hollingsworth developed Cache Scope [7] to perform data-centric analysis using Itanium 2 event address registers (EAR). Cache Scope associates latency with data objects and functions that accessed them.…”
Section: Tools For Identifying Poor Cache Localitymentioning
confidence: 99%
See 1 more Smart Citation
“…Buck and Hollingsworth developed Cache Scope [7] to perform data-centric analysis using Itanium 2 event address registers (EAR). Cache Scope associates latency with data objects and functions that accessed them.…”
Section: Tools For Identifying Poor Cache Localitymentioning
confidence: 99%
“…A variety of data-centric tools currently exist; we discuss them in detail in Section 6. Some tools focus on data locality in sequential codes [25,26,7]; the others focus on NUMA problems in threaded codes [28,20]; none of them supports comprehensive analysis of all kinds of data locality problems. Moreover, existing tools work on modest numbers of cores on a single node system; none of them tackles the challenge of scaling and is applicable across a cluster with many hardware threads on each node.…”
Section: Introductionmentioning
confidence: 99%
“…Tikir et al describe a profile-driven online page migration scheme using hardware performance counters [Mustafa M. Tikir 2004]. Buck et al use the Itanium-2 data tracing PMU support to associate load misses to source code lines and data structures in uniprocessor programs [Buck and Hollingsworth 2004]. Buck et al also compare different hardware mechanism for detecting uniprocessor memory hierarchy bottlenecks [Buck and Hollingsworth 2000b].…”
Section: Related Workmentioning
confidence: 99%
“…Tikir et al describe a profile-driven online page migration scheme using hardware performance counters [24]. Buck et al use the Itanium-2 data tracing PMU support to associate load misses to source code lines and data structures in uniprocessor programs [7]. Buck et al also compare different hardware mechanism for detecting uniprocessor memory hierarchy bottlenecks [6].…”
Section: Related Workmentioning
confidence: 99%