1996
DOI: 10.1145/233008.233035
|View full text |Cite
|
Sign up to set email alerts
|

Integrating performance monitoring and communication in parallel computers

Abstract: A large and increasing gap exists between processor and memory speeds in scalable cache-coherent multiprocessors. To cope with this situation, programmers and compiler writers must increasingly be aware of the memory hierarchy as they implement software. Tools to support memory performance tuning have, however, been hobbled by the fact that it is difficult to observe the caching behavior of a running program. Little hardware support exists specifically for observing caching behavior; furthermore, what support … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

1997
1997
2010
2010

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 20 publications
(4 reference statements)
0
12
0
Order By: Relevance
“…Hardware support for examining the contents of CPU caches directly would make this task straightforward, and improve its precision. FlashPoint [21] is one example of a possible hardware change that can make tracking cache misses easier. DProf is also limited by having access to only four debug registers for tracing memory accesses through application code.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Hardware support for examining the contents of CPU caches directly would make this task straightforward, and improve its precision. FlashPoint [21] is one example of a possible hardware change that can make tracking cache misses easier. DProf is also limited by having access to only four debug registers for tracing memory accesses through application code.…”
Section: Discussionmentioning
confidence: 99%
“…The authors of FlashPoint [21] propose an interface that allows the programmer to tell the hardware what data they are interested in profiling. The hardware directly collects fine grained statistics and classifies cache misses.…”
Section: Data Profilersmentioning
confidence: 99%
“…The result is presented at the procedure and data-structure level and indicates whether the misses were caused by communication or not. The FlashPoint tool [12] gathers similar information using the programmable cache-coherence controllers in the FLASH multiprocessor computer. CPROF [8] uses a binary executable editor to insert calls to a cache simulator for every load and store instruction.…”
Section: Related Workmentioning
confidence: 99%
“…However, the real power of Tempest lies in the opportunity to optimize performance using customized coherence protocols tailored to specific data structures and specific phases within an application. Tempest also aids the optimization process itself by enabling extended coherence protocols that collect profiling information [34], [57]. This information drives high-level tools that help programmers identify and understand performance bottlenecks that may benefit from custom protocols.…”
Section: Optimizing Communication Using Tempestmentioning
confidence: 99%