Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU

Neelima, B.; Reddy, G. Ram Mohana; Raghavendra, Prakash S.

doi:10.1109/ipdpsw.2014.160

Cited by 16 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We leave it for future work to see whether this approach improves the prediction accuracy for our experiments. Most of the other autotuning work have smaller matrix sets than ours, for example, ∼14-150 [Grewe and Lokhmotov 2011;Muralidharan et al 2014;Neelima et al 2014;Guo et al 2014]. There also are studies with bigger matrix sets, for example, ∼2000 in Li et al [2013] and 1000 (synthetic) in Armstrong and Rendell [2008].…”

Section: Related Workmentioning

confidence: 83%

“…In existing work, features are usually determined according to the matrix storage formats, not code size. N and NZ are almost always collected as features (e.g., El Zein and Rendell [2012], Rendell [2008, 2010], Li et al [2013], and Neelima et al [2014] [Li et al 2013;Neelima et al 2014], and -memory traffic (number of bytes fetched, number of writes to w) [Belgin et al 2011].…”

Section: Featuresmentioning

confidence: 99%

“…Previous autotuning approaches for SpMV focus on choosing an optimal storage format, because even the basic sparsity regime of a matrix can have a profound effect on the performance [Bell and Garland 2009]. To this end, there exist work using decision trees [Li et al 2013], dynamic-programming [Guo et al 2014], reinforcement learning [Armstrong and Rendell 2008], heuristic-based autotuning [Abu-Sufah and Abdel Karim 2013], and model-driven approaches [Neelima et al 2014;Choi et al 2010]. To the best of our knowledge, ours is the first study on applying autotuning to pick among several specialization methods.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication

Yilmaz

Aktemur

Garzarán

et al. 2016

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Runtime specialization is used for optimizing programs based on partial information available only at runtime. In this paper we apply autotuning on runtime specialization of Sparse Matrix-Vector Multiplication to predict a best specialization method among several. In 91% to 96% of the predictions, either the best or the second-best method is chosen. Predictions achieve average speedups that are very close to the speedups achievable when only the best methods are used. By using an efficient code generator and a carefully designed set of matrix features, we show the runtime costs can be amortized to bring performance benefits for many real-world cases.

show abstract

Section: Related Workmentioning

confidence: 83%

Section: Featuresmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication

Yilmaz

Aktemur

Garzarán

et al. 2016

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…We can find in the literature many analytical approaches that deal with the identification of the optimal sparse matrix format for GPUs based on performance models [12][13][14]. They show a good accuracy but models are usually tested considering a small set of matrices.…”

Section: Related Workmentioning

confidence: 99%

A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques

Pichel

Pateiro-López

2018

2018 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

In this paper, a new methodology to select the best storage format for sparse matrices based on deep learning techniques is introduced. We focus on the selection of the proper format for the sparse matrixvector multiplication (SpMV), which is one of the most important computational kernels in many scientific and engineering applications. Our approach considers the sparsity pattern of the matrices as an image, using the RGB channels to code several of the matrix properties. As a consequence, we generate image datasets that include enough information to successfully train a Convolutional Neural Network (CNN). Considering GPUs as target platforms, the trained CNN selects the best storage format 90.1% of the time, obtaining 99.4% of the highest SpMV performance among the tested formats.

show abstract

“…Using shared memory, there can be substantial performance gains, when compared with the global memory access, which takes far more clock cycles. The readers are directed to some of the work carried out by the first author towards various optimizations and usage of GPU for scientific computations at .…”

Section: Introductionmentioning

confidence: 99%

Kepler GPU accelerated recursive sorting using dynamic parallelism

Neelima¹,

Shamsundar²,

Narayan

et al. 2016

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary This paper focuses on the performance gain obtained on the Kepler graphics processing units (GPUs) for multi‐key quicksort. Because multi‐key quicksort is a recursive‐based algorithm, many of the researchers have found it tedious to parallelize the algorithm on the multi and many core architectures. A survey of the state‐of‐the‐art string sorting algorithms and a robust insight of the Kepler GPU architecture gave rise to an intriguing research idea of matching the template of multi‐key quicksort with the dynamic parallelism feature offered by the Kepler‐based GPU's. The CPU parallel implementation has an improvement of 33 to 50% and 62 to 75 improvement when compared with 8‐bit and 16‐bit parallel most significant digit radix sort, respectively. The GPU implementation of multi‐key quicksort gives 6× to 18× speed up compared with the CPU parallel implementation of parallel multi‐key quicksort. The GPU implementation of multi‐key quicksort achieves 1.5× to 3× speed up when compared with the GPU implementation of string sorting algorithm using singleton elements in the literature. Copyright © 2016 John Wiley & Sons, Ltd.

show abstract

Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU

Cited by 16 publications

References 10 publications

Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication

Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication

A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques

Kepler GPU accelerated recursive sorting using dynamic parallelism

Contact Info

Product

Resources

About