2016
DOI: 10.1002/cpe.4064
|View full text |Cite
|
Sign up to set email alerts
|

Efficient and high‐quality sparse graph coloring on GPUs

Abstract: Summary Graph coloring has been broadly used to discover concurrency in parallel computing. To speed up graph coloring for large‐scale datasets, parallel algorithms have been proposed to leverage modern GPUs. Existing GPU implementations either have limited performance or yield unsatisfactory coloring quality (too many colors assigned). We present a work‐efficient parallel graph coloring implementation on GPUs with good coloring quality. Our approach uses the speculative greedy scheme, which inherently yields … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 61 publications
0
9
0
Order By: Relevance
“…Graph analytics have been widely applied in many applications [39,40]. In this paper, we have presented HPGraph, a GPU graph analytics framework which maps a vertex programming to an optimized matrix backend.…”
Section: Discussionmentioning
confidence: 99%
“…Graph analytics have been widely applied in many applications [39,40]. In this paper, we have presented HPGraph, a GPU graph analytics framework which maps a vertex programming to an optimized matrix backend.…”
Section: Discussionmentioning
confidence: 99%
“…The experimental results show that Feluca achieves up to 8.39×, 14.70×, 7.55×, and 9.70× speed up over kokkos [20], Gunrock [36], SIRG [44] and ChenGC [42], [43], respectively. Table 4 shows that Feluca outperforms all other competitors in terms of run-time with all ten datasets.…”
Section: Comparison Against the State-of-the-art Techniquesmentioning
confidence: 99%
“…We compared Feluca with some state-of-the-art methods in this area, such as kokkos [20], Gunrock [36], GraphBLAST [41], ChenGC [42], [43], SIRG [44], cuSPARSE [40] and JPL [40]. In this experiment, Feluca switches the execution stage by setting α to 10%.…”
Section: Comparison Against the State-of-the-art Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared to stochastic gradient descent (SGD) [8,9], the ALS algorithm is not only inherently parallel, but can incorporate implicit ratings [1]. Nevertheless, the ALS algorithm involves parallel sparse matrix manipulation [10] which is challenging to achieve high performance due to imbalanced workload [11,12,13], random memory access [14,15], unpredictable amount of computations [16] and task dependency [17,18,19]. This particularly holds when parallelizing and optimizing ALS on modern multi-cores and many-cores [20].…”
Section: Introductionmentioning
confidence: 99%