Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2014
DOI: 10.1145/2555243.2555255
|View full text |Cite
|
Sign up to set email alerts
|

yaSpMV

Abstract: SpMV is a key linear algebra algorithm and has been widely used in many important application domains. As a result, numerous attempts have been made to optimize SpMV on GPUs to leverage their massive computational throughput. Although the previous work has shown impressive progress, load imbalance and high memory bandwidth remain the critical performance bottlenecks for SpMV. In this paper, we present our novel solutions to these problems. First, we devise a new SpMV format, called blocked compressed common co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 80 publications
(9 citation statements)
references
References 19 publications
(31 reference statements)
0
9
0
Order By: Relevance
“…Because of less off-chip memory access and better on-chip memory localization, block-based formats or libraries, such as OSKI [38,42,43], pOSKI [37], CSB [44,45], BELLPACK [46], BCCOO/BCCOO+ [5], BRC [6] and RSB [47], attracted the most attention. However, block-based formats heavily rely on sparsity structure, meaning that the input matrix is required to have a block structure to meet potential block layout.…”
Section: Comparison To Related Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Because of less off-chip memory access and better on-chip memory localization, block-based formats or libraries, such as OSKI [38,42,43], pOSKI [37], CSB [44,45], BELLPACK [46], BCCOO/BCCOO+ [5], BRC [6] and RSB [47], attracted the most attention. However, block-based formats heavily rely on sparsity structure, meaning that the input matrix is required to have a block structure to meet potential block layout.…”
Section: Comparison To Related Methodsmentioning
confidence: 99%
“…Table 3 lists main information of the evaluated sparse matrices. The first 14 matrices of the benchmark suite have been widely used in previous work [1,2,5,6,12,13]. The last 6 matrices are chosen as representatives of irregular matrices extracted from graph applications, such as circuit simulation and optimization problems.…”
Section: Benchmark Suitementioning
confidence: 99%
See 1 more Smart Citation
“…Focusing on pure performance, the state-of-the-art in SpMV optimization is arguably BCCOO [28]. This advanced sparse matrix format is an evolution of COO based on blocking and row indices compression, where load balancing is achieved by the means of a highly-efficient segmented reduction (which, however, relies on a non-portable synchronization-free mechanism that stalls on modern AMD GPUs).…”
Section: Related Workmentioning
confidence: 99%
“…The first column of the Table 16 reports the time necessary to load the original Matrix Market file [33] from disk. The following columns are instead the preprocessing times to compose and tune CSR, AdELL+ and BCCOO [28]. Note that the measurements for this latter format are available only in single-precision due to the lack of double-precision implementation.…”
Section: Online Auto-tuningmentioning
confidence: 99%