2014 IEEE International Symposium on Workload Characterization (IISWC) 2014
DOI: 10.1109/iiswc.2014.6983053
|View full text |Cite
|
Sign up to set email alerts
|

Graph processing on GPUs: Where are the bottlenecks?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
41
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(41 citation statements)
references
References 26 publications
0
41
0
Order By: Relevance
“…Xu et al [16] studied 12 graph applications in order to identify bottlenecks that limit GPU performance. They show that graph applications tend to need frequent kernel invocations and make ineffective use of caches compared to non-graph applications.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Xu et al [16] studied 12 graph applications in order to identify bottlenecks that limit GPU performance. They show that graph applications tend to need frequent kernel invocations and make ineffective use of caches compared to non-graph applications.…”
Section: Related Workmentioning
confidence: 99%
“…Workload distribution and load balancing are crucial issues for performance; previous work has observed that these operations are dependent on graph structure [17,16]. Hardwired graph primitive implementations have prioritized efficient (and primitive-customized) implementations of these operations, thus to be competitive, high-level programmable frameworks must offer high-performance but high-level strategies to address them.…”
Section: Critical Aspects For Efficiencymentioning
confidence: 99%
“…We also note that the algorithm of Auer and Bisseling repeatedly considers all the vertices and is therefore not (work) efficient, but scales better. Xu et al use the algorithm of Auer and Bisseling, along with several other graph algorithms [21]. We address some of the performance issues raised by them in our work.…”
Section: Related Workmentioning
confidence: 99%
“…The Nvidia Kepler K40 presented in Section 3 is currently one of the best manycore platforms for scientific computing. While many significant performance gains for compute intensive applications with regular and predictable memory access patterns have been demonstrated using GPUs, the efficient implementation of irregular applications such as graph algorithms remains a challenge [21]. Highly irregular degree distributions, poor locality in memory accesses, and minimal computation on accessed data make efficient utilization of compute resources challenging.…”
Section: Gpu-suitor-hybridmentioning
confidence: 99%
“…A continuation of that research uses a software simulator to change GPU architectural parameters and observes performance is more sensitive to L2 cache parameters than to DRAM parameters, which suggests there is exploitable locality [37]. Xu et al also use a simulator and identify synchronization with the CPU (kernel invocations and data transfers) as well as GPU memory latency to be the biggest performance bottlenecks [44]. Che et al profile the Pannotia suite of graph algorithms and observe substantial diversity across algorithms and inputs [10].…”
Section: Related Workmentioning
confidence: 99%