2016
DOI: 10.1137/15m104253x
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Abstract: International audienceSparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös--Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been anal… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
90
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 90 publications
(90 citation statements)
references
References 38 publications
0
90
0
Order By: Relevance
“…value ← value ⊕ A T (i, j) ⊗ v(j) Our column-based masked matvec follows Gustavson's algorithm for SpGEMM (sparse matrix-sparse matrix multiplication), but specialized to matvec [19]. The key challenge in parallelizing Gustavson's algorithm is solving the multiway merge problem [1]. For the GPU, our parallelization approach follows the scan-gather-sort approach outlined by Yang et al [32] and is shown in Algorithm 3.…”
Section: Row-based Masked Matvec (Pull Phase)mentioning
confidence: 99%
“…value ← value ⊕ A T (i, j) ⊗ v(j) Our column-based masked matvec follows Gustavson's algorithm for SpGEMM (sparse matrix-sparse matrix multiplication), but specialized to matvec [19]. The key challenge in parallelizing Gustavson's algorithm is solving the multiway merge problem [1]. For the GPU, our parallelization approach follows the scan-gather-sort approach outlined by Yang et al [32] and is shown in Algorithm 3.…”
Section: Row-based Masked Matvec (Pull Phase)mentioning
confidence: 99%
“…First, we show light-weight thread scheduling scheme with load-balancing for SpGEMM. Next, we show the optimization schemes for hash table based SpGEMM, which is proposed for GPU [25], and heap based shared-memory SpGEMM algorithms [3]. Additionally, we extend the Hash SpGEMM with utilizing vector registers of Intel Xeon or Xeon Phi.…”
Section: Architecture Specific Optimization Of Spgemmmentioning
confidence: 99%
“…In another variant of SpGEMM [3], we use a priority queue (heap) -indexed by column indices -to accumulate each row of C. To construct c i * , a heap of size nnz(a i * ) is allocated. For every nonzero a ik , the first nonzero entry in b k * along with its column index is inserted into the heap.…”
Section: Heap Spgemmmentioning
confidence: 99%
See 1 more Smart Citation
“…In this way load balance is achieved. Linear weak scaling efficiency is possible if, instead of the two-dimensional process grid used in Cannon's algorithm and SUMMA, a three-dimensional process grid is used as in [3,2,11]. However, because of the random permutation of matrix rows and columns, the possibility to exploit the nonzero structure to avoid movement of data or communication is lost.…”
mentioning
confidence: 99%