2014 IEEE 28th International Parallel and Distributed Processing Symposium 2014
DOI: 10.1109/ipdps.2014.47
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

Abstract: General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method, breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM algorithm has to handle extra irregularity from three aspects: (1) the number of the nonzero entries in the result sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the result sparse matrix domin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
70
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 96 publications
(73 citation statements)
references
References 36 publications
0
70
0
Order By: Relevance
“…The performance rate is defined as the ratio of the arithmetic workload and the measured processing time. The arithmetic workload flops(A,B) is defined as twice the number of nontrivial scalar multiplications (to account for the additions) which can be computed as j∈âi: nnz(b j: ) for each result row c i: [12,31], whereâ i: denotes the nonzero indices of row a i: . All performance rate measurements were repeated 11 times and the median was used because of its robustness with respect to outliers.…”
Section: Performance Measurementsmentioning
confidence: 99%
See 1 more Smart Citation
“…The performance rate is defined as the ratio of the arithmetic workload and the measured processing time. The arithmetic workload flops(A,B) is defined as twice the number of nontrivial scalar multiplications (to account for the additions) which can be computed as j∈âi: nnz(b j: ) for each result row c i: [12,31], whereâ i: denotes the nonzero indices of row a i: . All performance rate measurements were repeated 11 times and the median was used because of its robustness with respect to outliers.…”
Section: Performance Measurementsmentioning
confidence: 99%
“…Table 1, were taken from the University of Florida Sparse Matrix Collection [13]. Inspired by [31], we sorted the matrices into regular (the upper 10) and irregular matrices (the lower 11) and sorted these subsets alphabetically. Regular matrices result from problems involving mesh approximations, e.g., from finite element methods, while irregular matrices mostly result from network structures.…”
Section: Performance Measurementsmentioning
confidence: 99%
“…Parallelization of the computation on the Device is done mainly in the matrix product (Liu & Vinter, 2014) and in the calculation of the sigmoid function. We have defined a fixed number of threads per block and the number of blocks is calculated from the size of the layer divided by the number of threads.…”
Section: Implementation Of Neural Networkmentioning
confidence: 99%
“…Thus more evaluation criteria, such as format conversion cost and memory footprint, must be taken into consideration. Secondly, when the SpMV operation is used with other sparse building blocks (e.g., sparse matrix-matrix multiplication [11]) that require basic storage formats, using the all-new formats is less feasible.…”
Section: Introductionmentioning
confidence: 99%