2007 IEEE International Parallel and Distributed Processing Symposium 2007
DOI: 10.1109/ipdps.2007.370536
|View full text |Cite
|
Sign up to set email alerts
|

Load Miss Prediction - Exploiting Power Performance Trade-offs

Abstract: Abstract-Modern CPUs operate at GHz frequencies, but the latencies of memory accesses are still relatively large, in the order of hundreds of cycles. Deeper cache hierarchies with larger cache sizes can mask these latencies for codes with good data locality and reuse, such as structured dense matrix computations. However, cache hierarchies do not necessarily benefit sparse scientific computing codes, which tend to have limited data locality and reuse. We therefore propose a new memory architecture with a Load … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2007
2007
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 21 publications
(26 reference statements)
0
5
0
1
Order By: Relevance
“…We use SimpleScalar configured to accept PISA compiled programs to model a single-core processor (such as the one in BlueGene [18]), starting from a PowerPC440 embedded core. We use Wattch [2] to calculate the power consumption with extrapolations for .13 um technology [11], [15], [16]. We also developed a DDR2 type memory performance and power simulator for use with our modified 1-4244-0910-1/07/$20.00 ©2007 IEEE versions of SimpleScalar and Wattch.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use SimpleScalar configured to accept PISA compiled programs to model a single-core processor (such as the one in BlueGene [18]), starting from a PowerPC440 embedded core. We use Wattch [2] to calculate the power consumption with extrapolations for .13 um technology [11], [15], [16]. We also developed a DDR2 type memory performance and power simulator for use with our modified 1-4244-0910-1/07/$20.00 ©2007 IEEE versions of SimpleScalar and Wattch.…”
Section: Methodsmentioning
confidence: 99%
“…We discuss how memory optimizations that we have developed earlier [11], [15], [16] can affect the performance of tuned and un-tuned versions of sparse matrix vector multiplication. We consider the use of such optimizations with powersaving modes of the hardware such as Dynamic Voltage and Frequency Scaling (DVFS) [5] to improve performance at significantly lower power levels.…”
Section: Introductionmentioning
confidence: 99%
“…Utilizando a criação paralela de requisições, temos o trabalho [23]. Apesar de não ser utilizado uma cache inclusiva, neste trabalho foi utilizado um preditor para decidir se deve ou não requisitar um dado diretamente a memória principal.…”
Section: Trabalhos Relacionadosunclassified
“…For example, data can be prefetched into dead blocks and while replacing, first preference can be given to dead blocks. The energy overhead of CBTs (e.g., due to predictors) can be offset by using dynamic voltage/frequency scaling (DVFS) technique [70]. 9.…”
Section: Adaptive Bypassingmentioning
confidence: 99%
“…Predictor organization: Many CBTs use predictors (e.g., dead block predictors) for storing metadata and making bypassing decisions. The predictors indexed by PC of memory instructions incur less overhead than those indexed by addresses [20,23,35,39,53,61,70,71].…”
Section: Probabilistic Bypassingmentioning
confidence: 99%