Proceedings of the 19th Annual International Conference on Supercomputing 2005
DOI: 10.1145/1088149.1088153
|View full text |Cite
|
Sign up to set email alerts
|

A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks

Abstract: High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of SMPs, nodes of chipmultiprocessors (CMP) or large-scale single-system image (SSI) SMPs. In such systems, performance is often affected by the sharing pattern of data within applications and its impact on cache coherence. Sharing patterns that result in frequent invalidations followed by subsequent coherence misses create cache coherence bottlenecks with significant performance penalties. Past work on identifying… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2007
2007
2013
2013

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 26 publications
(24 reference statements)
0
9
0
Order By: Relevance
“…The NAS benchmarks are C versions of the original NAS-2.3 serial benchmarks [4] provided by the Omni Compiler group [5]. EP is not evaluated since it does not have significant sharing of data [6]. In addition, the 320.equake and 332.ammp benchmarks from the SPEC OMPM2001 benchmark set were assessed in the results.…”
Section: Trace-guided Page Placementmentioning
confidence: 99%
“…The NAS benchmarks are C versions of the original NAS-2.3 serial benchmarks [4] provided by the Omni Compiler group [5]. EP is not evaluated since it does not have significant sharing of data [6]. In addition, the 320.equake and 332.ammp benchmarks from the SPEC OMPM2001 benchmark set were assessed in the results.…”
Section: Trace-guided Page Placementmentioning
confidence: 99%
“…RSDs were originally proposed to track inter-procedural side effects on common substructures of arrays to promote compiler-aided parallelization [15]. Marathe et al adapted the RSD representation and proposed PRSDs for memory trace compression [21,20]. Budanur et al further designed Extended-PRSDs to perform multilevel scalable parallel memory tracing in SCALAMEMTRACE [3].…”
Section: Related Workmentioning
confidence: 99%
“…Our work differs in that is further develops concepts of in-situ compression from ScalaTrace [22] and METRIC [17,20,15,16,18,19]. ScalaTrace addresses intra-task and inter-process compression of communication traces, but not memory traces.…”
Section: Related Workmentioning
confidence: 99%