Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2010
DOI: 10.1145/1693453.1693471
|View full text |Cite
|
Sign up to set email alerts
|

Model-driven autotuning of sparse matrix-vector multiply on GPUs

Abstract: We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts.First, we describe several carefully hand-tuned SpMV implementations for GPUs, identifying key GPU-specific performance limitations, enhancements, and tuning opportunities. These implementations, which include variants on classical blocked compressed sparse row (BCSR) and blocked ELLPACK (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
104
0
1

Year Published

2011
2011
2015
2015

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 221 publications
(106 citation statements)
references
References 16 publications
1
104
0
1
Order By: Relevance
“…Choi et al [12] designed a blocked ELLPACK format and proposed a CUDA performance model to predict matrixdependent tuning parameters. Xu et al [13] proposed the optimized SpMV based on ELL format and a SpMV CUDA performance model.…”
Section: Related Workmentioning
confidence: 99%
“…Choi et al [12] designed a blocked ELLPACK format and proposed a CUDA performance model to predict matrixdependent tuning parameters. Xu et al [13] proposed the optimized SpMV based on ELL format and a SpMV CUDA performance model.…”
Section: Related Workmentioning
confidence: 99%
“…Because of less off-chip memory access and better on-chip memory localization, block-based formats or libraries, such as OSKI [38,42,43], pOSKI [37], CSB [44,45], BELLPACK [46], BCCOO/BCCOO+ [5], BRC [6] and RSB [47], attracted the most attention. However, block-based formats heavily rely on sparsity structure, meaning that the input matrix is required to have a block structure to meet potential block layout.…”
Section: Comparison To Related Methodsmentioning
confidence: 99%
“…Their method is general and requires that global memory bandwidth is not the bottleneck for performance. In [35], a new compressed format is proposed for sparse matrix on GPU, and search is needed to determine certain parameters of the format. They proposed an analytical model specific to SpMV to eliminate search candidates.…”
Section: Related Workmentioning
confidence: 99%