SC20: International Conference for High Performance Computing, Networking, Storage and Analysis 2020
DOI: 10.1109/sc41405.2020.00076
|View full text |Cite
|
Sign up to set email alerts
|

GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 72 publications
(49 citation statements)
references
References 14 publications
0
49
0
Order By: Relevance
“…In the following, we review a comprehensive selection of software frameworks and accelerators, listed in Table 7. The analysis does not include GunRock [154] or GE-SpMM [74] for different reasons. GunRock, despite implementing GraphSAGE in its latest versions, is a graph processing library that does not exploit intra-vertex parallelism.…”
Section: Software Framework and Acceleratorsmentioning
confidence: 99%
“…In the following, we review a comprehensive selection of software frameworks and accelerators, listed in Table 7. The analysis does not include GunRock [154] or GE-SpMM [74] for different reasons. GunRock, despite implementing GraphSAGE in its latest versions, is a graph processing library that does not exploit intra-vertex parallelism.…”
Section: Software Framework and Acceleratorsmentioning
confidence: 99%
“…Our recent work on distributed-memory GNN training Tripathy et al (2020) showed that communication-avoiding algorithms greatly accelerate GNN training at the expense of increasing memory requirements. The primary workhorse of GNN training and inference is the sparse matrix-dense matrix product (Yang et al, 2018; Huang et al, 2020). The algorithmic research on marginalized graph kernels and communication-avoiding distributed GNN training has been primarily supported by the ASCR Applied Math program.…”
Section: Algebraic Approaches For Graph Algorithms and Combinatorial Problemsmentioning
confidence: 99%
“…Then, we use the loaded data (in shared memory) to calculate the corresponding tile of output feature matrix (dense). Inspired by existing works [16,22], load imbalance may severely hurt the performance on the GPU, while we solve this issue through an algorithm-hardware co-design. On the algorithm side, we limit all the filters in the same layer have the same number of un-pruned (non-zero) weights in our pattern-based pruning.…”
Section: Pattern-accelerated Spmm For Sparse Convolutionmentioning
confidence: 99%