2014 IEEE International Parallel &Amp; Distributed Processing Symposium Workshops 2014
DOI: 10.1109/ipdpsw.2014.160
|View full text |Cite
|
Sign up to set email alerts
|

Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU

Abstract: Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…We leave it for future work to see whether this approach improves the prediction accuracy for our experiments. Most of the other autotuning work have smaller matrix sets than ours, for example, ∼14-150 [Grewe and Lokhmotov 2011;Muralidharan et al 2014;Neelima et al 2014;Guo et al 2014]. There also are studies with bigger matrix sets, for example, ∼2000 in Li et al [2013] and 1000 (synthetic) in Armstrong and Rendell [2008].…”
Section: Related Workmentioning
confidence: 83%
See 2 more Smart Citations
“…We leave it for future work to see whether this approach improves the prediction accuracy for our experiments. Most of the other autotuning work have smaller matrix sets than ours, for example, ∼14-150 [Grewe and Lokhmotov 2011;Muralidharan et al 2014;Neelima et al 2014;Guo et al 2014]. There also are studies with bigger matrix sets, for example, ∼2000 in Li et al [2013] and 1000 (synthetic) in Armstrong and Rendell [2008].…”
Section: Related Workmentioning
confidence: 83%
“…In existing work, features are usually determined according to the matrix storage formats, not code size. N and NZ are almost always collected as features (e.g., El Zein and Rendell [2012], Rendell [2008, 2010], Li et al [2013], and Neelima et al [2014] [Li et al 2013;Neelima et al 2014], and -memory traffic (number of bytes fetched, number of writes to w) [Belgin et al 2011].…”
Section: Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…We can find in the literature many analytical approaches that deal with the identification of the optimal sparse matrix format for GPUs based on performance models [12][13][14]. They show a good accuracy but models are usually tested considering a small set of matrices.…”
Section: Related Workmentioning
confidence: 99%
“…Using shared memory, there can be substantial performance gains, when compared with the global memory access, which takes far more clock cycles. The readers are directed to some of the work carried out by the first author towards various optimizations and usage of GPU for scientific computations at .…”
Section: Introductionmentioning
confidence: 99%