2017 46th International Conference on Parallel Processing (ICPP) 2017
DOI: 10.1109/icpp.2017.19
|View full text |Cite
|
Sign up to set email alerts
|

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
82
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 60 publications
(84 citation statements)
references
References 17 publications
0
82
0
Order By: Relevance
“…In its marketing materials, cuSPARSE claims a 2-5× speedup over CPU competitors, and the raw computational and memory throughput of a GPU has a similar multiple over the CPU, so we believe this kernel represents the most significant opportunity to improve GPU performance. Recent GPU library implementations, including bhSPARSE [7], nsparse [10], and RMerge2 [3], have demonstrated significant speedups over cuSPARSE, and may be well-suited for the matrix operations we require in this challenge. cuSPARSE has the unenviable task of running effectively on any sparse matrix and thus its developers may have concentrated more on generality than performance.…”
Section: Discussionmentioning
confidence: 99%
“…In its marketing materials, cuSPARSE claims a 2-5× speedup over CPU competitors, and the raw computational and memory throughput of a GPU has a similar multiple over the CPU, so we believe this kernel represents the most significant opportunity to improve GPU performance. Recent GPU library implementations, including bhSPARSE [7], nsparse [10], and RMerge2 [3], have demonstrated significant speedups over cuSPARSE, and may be well-suited for the matrix operations we require in this challenge. cuSPARSE has the unenviable task of running effectively on any sparse matrix and thus its developers may have concentrated more on generality than performance.…”
Section: Discussionmentioning
confidence: 99%
“…In the beginning, we resorted to the standard kernel available in the cusparse [23] library (cusparseDcsrmm). However, we found that its performance was far from being optimal and we changed our code to use Nsparse, a recent implementation of sparse matrix-matrix product available in open source format [24]. Nsparse, as the implementation of Suitor, relies on the legacy shuffle primitives, nevertheless it provides a clear advantage with respect to the general-purpose primitives available in cusparse.…”
Section: Setup Of the Preconditionermentioning
confidence: 99%
“…First, we show light-weight thread scheduling scheme with load-balancing for SpGEMM. Next, we show the optimization schemes for hash table based SpGEMM, which is proposed for GPU [25], and heap based shared-memory SpGEMM algorithms [3]. Additionally, we extend the Hash SpGEMM with utilizing vector registers of Intel Xeon or Xeon Phi.…”
Section: Architecture Specific Optimization Of Spgemmmentioning
confidence: 99%
“…We use hash table for accumulator in SpGEMM computation, based on GPU work [25]. Figure 7 shows the algorithm of Hash SpGEMM for multi-and many-core processors.…”
Section: Hash Spgemmmentioning
confidence: 99%