2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2019
DOI: 10.1109/hpca.2019.00051
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
25
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 66 publications
(29 citation statements)
references
References 46 publications
0
25
0
Order By: Relevance
“…Across a complete set of 29 workloads, we show a significant average speedup of 2.6× and energy savings of 1.6× compared to a non-prefetching baseline. Using our evaluation framework, we further show that Prodigy outperforms IMP [99], Ainsworth and Jones' prefetcher [6], and DROPLET [15] by 2.3×, 1.5×, and 1.6×, respectively. The compact DIG representation allows Prodigy to achieve high speedups at a mere 0.8KB of hardware storage overhead.…”
Section: Introductionmentioning
confidence: 83%
See 3 more Smart Citations
“…Across a complete set of 29 workloads, we show a significant average speedup of 2.6× and energy savings of 1.6× compared to a non-prefetching baseline. Using our evaluation framework, we further show that Prodigy outperforms IMP [99], Ainsworth and Jones' prefetcher [6], and DROPLET [15] by 2.3×, 1.5×, and 1.6×, respectively. The compact DIG representation allows Prodigy to achieve high speedups at a mere 0.8KB of hardware storage overhead.…”
Section: Introductionmentioning
confidence: 83%
“…Hardware prefetchers rely on capturing memory access patterns using explicit programmer support [5], [6], learning techniques [77], and intelligent hardware structures [99]. Limitations of these approaches include their limited applicability to a subset of data structures and indirect memory access patterns [6], [15], [99] or high complexity and hardware cost to support generalization [5], [77]. While software prefetching [7] can exploit static semantic view of algorithms, it lacks dynamic run-time information and struggles to maintain prefetch timeliness.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In Aggregation phase, the Computation Unit Utilization is only 50% and the Executed IPC is only 1.78 on average as shown in Table 3. The aggregation heavily relies on the graph structure so that it is obstructed by irregularity [8] and load-load data dependency chain [11]. Therefore, it is mainly stalled for Data Request and Execution Dependency as depicted in Fig.…”
Section: Analysis Of Overall Executionmentioning
confidence: 99%