2002
DOI: 10.1145/773039.773043
|View full text |Cite
|
Sign up to set email alerts
|

Calculating stack distances efficiently

Abstract: This paper 1 describes our experience using the stack processing algorithm [6] for estimating the number of cache misses in scientific programs. By using a new data structure and various optimization techniques we obtain instrumented run-times within 50 to 100 times the original optimized run-times of our benchmarks.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
74
0

Year Published

2006
2006
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 53 publications
(74 citation statements)
references
References 7 publications
0
74
0
Order By: Relevance
“…To derive the number of cache misses, we will later use stack distance histograms [4]- [7] within our performance estimation framework. These stack distance histograms only depend on the number of cache sets and the cache line size (which we assume to be fixed).…”
Section: B Stack Distance Histogram Computationmentioning
confidence: 99%
See 1 more Smart Citation
“…To derive the number of cache misses, we will later use stack distance histograms [4]- [7] within our performance estimation framework. These stack distance histograms only depend on the number of cache sets and the cache line size (which we assume to be fixed).…”
Section: B Stack Distance Histogram Computationmentioning
confidence: 99%
“…Estimation techniques based on the stack distance [3] provide more accuracy, but suffer from the memory overhead. Several methods have been proposed to enable an efficient computation [4]- [7] or approximation [8] of the stack distance. Stack distances and cache miss equations, however, only provide the number of misses and hits for different cache configurations.…”
Section: Introductionmentioning
confidence: 99%
“…Computation to determine the distribution can be reduced with efficient algorithms [5] or by approximate analysis [49]. Shi, et al [38] perform single-pass stack simulation to project cache performance and to study the impact of data replication for various L2 cache configurations.…”
Section: Related Workmentioning
confidence: 99%
“…As derived in Section II-D, our temporal locality essentially represents the same information as reuse distance histograms (cumulative distribution function vs. probability distribution function). Although we can leverage the previous works on fast computation of reuse distances [1], we aim to reduce the computation time by an order of magnitude to make it practical for compiler or runtime profile analysis. Based on the inherent data-level parallelism of our locality computation, we resort to parallel computation on graphics processing units (GPUs [21].…”
Section: A Gpu-based Parallel Algorithm For Locality Computationmentioning
confidence: 99%