Proceedings of the 23rd International Conference on Parallel Architectures and Compilation 2014
DOI: 10.1145/2628071.2628082
|View full text |Cite
|
Sign up to set email alerts
|

Trading cache hit rate for memory performance

Abstract: Most of the prior compiler based data locality optimization works target exclusively cache locality optimization, and row-buffer locality in DRAM banks received much less attention. In particular, to the best of our knowledge, there is no single compiler based approach that can improve row-buffer locality in executing irregular applications. This presents a critical problem considering the fact that executing irregular applications in a power and performance efficient manner will be a key requirement to extrac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2016
2016

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…Expensive locality reorganization methods have not been able to amortize costs within a single iteration, and they are limited to applications that repeatedly process the same references, e.g., rearranging the index array [17,32] and remapping all arrays in a loop, or graph partitioning [22,23] or cheaper reorderings with lower benefits (e.g., space filling curves). Recent inspector/executor work [18] traded lower cache hit rate for improvement of DRAM row buffer hits for 14% net gains. Milk achieves up to 4× gains on static reference loops, and pays off in one iteration to also allow dynamic references.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Expensive locality reorganization methods have not been able to amortize costs within a single iteration, and they are limited to applications that repeatedly process the same references, e.g., rearranging the index array [17,32] and remapping all arrays in a loop, or graph partitioning [22,23] or cheaper reorderings with lower benefits (e.g., space filling curves). Recent inspector/executor work [18] traded lower cache hit rate for improvement of DRAM row buffer hits for 14% net gains. Milk achieves up to 4× gains on static reference loops, and pays off in one iteration to also allow dynamic references.…”
Section: Related Workmentioning
confidence: 99%
“…In Delivery, each partition's deferred updates are read from DRAM and processed, along with dependent statements. These three logical phases are similar to inspector-executor style optimizations [18,32,40]; in Milk, however, to eliminate materialization of partitions and conserve DRAM bandwidth, the phases are fused and run as coroutines. Prior research either focused on expensive preprocessing that resulted in net performance gain only when amortized over many loop executions, or explored simple inspection for correspondingly modest gains.…”
Section: Introductionmentioning
confidence: 99%