Proceedings of the International Conference on Supercomputing 2017
DOI: 10.1145/3079079.3079106
|View full text |Cite
|
Sign up to set email alerts
|

On improving performance of sparse matrix-matrix multiplication on GPUs

Abstract: Sparse matrix-matrix multiplication (SpGEMM) is an important primitive for many data analytics algorithms, such as Markov clustering. Unlike the dense case, where performance of matrix-matrix multiplication is considerably higher than matrix-vector multiplication, the opposite is true for the sparse case on GPUs. A signi cant challenge is that the sparsity structure of the output sparse matrix is not known a priori, and many additive contributions must be combined to generate its non-zero elements. We use synt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 21 publications
(22 reference statements)
0
11
0
Order By: Relevance
“…This boundary becomes too small for many sparse datasets which would instead benefit from coupling the shared memory size to individual row degrees. Inspired by other sparse matrix multiplication implementations on the GPU [8,30,32,34], we enhanced the vector insertion and lookup patterns of the COO SPMV design outlined in [2] by building a hash table to store these columns in shared memory. Unlike many other hash table implementations on the GPU [5,6,9,13,39], our implementation builds an independent hash table per thread-block and so many other designs and concurrency patterns that optimize the key distribution and collision-resolution strategies for the GPU are not efficient or cannot be easily ported for our use-case.…”
Section: Load Balanced Hybrid Csr+coomentioning
confidence: 99%
“…This boundary becomes too small for many sparse datasets which would instead benefit from coupling the shared memory size to individual row degrees. Inspired by other sparse matrix multiplication implementations on the GPU [8,30,32,34], we enhanced the vector insertion and lookup patterns of the COO SPMV design outlined in [2] by building a hash table to store these columns in shared memory. Unlike many other hash table implementations on the GPU [5,6,9,13,39], our implementation builds an independent hash table per thread-block and so many other designs and concurrency patterns that optimize the key distribution and collision-resolution strategies for the GPU are not efficient or cannot be easily ported for our use-case.…”
Section: Load Balanced Hybrid Csr+coomentioning
confidence: 99%
“…Related work on parallel SpGEMM. SpGEMM algorithms are extensively studied in the literature, with several parallel algorithms available for distributed memory systems [7], [9]- [11], for GPUs [12]- [19], and for multi-core systems [12], [20], [21]. Multiplying sparse matrices in parallel can be challenging for a number of reasons.…”
Section: Hipmclmentioning
confidence: 99%
“…Thread-Flat-Parallel: We use a Thread-Flat-Parallel scheme (Figure 3b) to overcome the limitations of the previous methods. This has also been explored in [8] and [19]. In this scheme, a row of A is assigned to a team, but as opposed to the Thread-Parallel scheme, this method flattens the second and third loop (Line-4 and Line-5).…”
Section: Spgemm Partitioning Schemesmentioning
confidence: 99%