2017 46th International Conference on Parallel Processing (ICPP) 2017
DOI: 10.1109/icpp.2017.38
|View full text |Cite
|
Sign up to set email alerts
|

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

Abstract: This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to bottleneck diversity. This justifies an SpMV optimizer that is both matrix-and architecture-adaptive through runtime specialization. To this direction, we present an approach that first identifies the performance bottlenecks of SpMV for a given sparse matrix on the target p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
30
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 31 publications
(30 citation statements)
references
References 30 publications
(24 reference statements)
0
30
0
Order By: Relevance
“…The categorization of the matrix is to be made based on install‐time information learned on the target machine and a dynamic analysis of the matrix. This is a typical auto‐tuning approach that has been successfully applied in the case of SpMV . Ideally, the dynamic analysis of the matrix should be as inexpensive as possible.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The categorization of the matrix is to be made based on install‐time information learned on the target machine and a dynamic analysis of the matrix. This is a typical auto‐tuning approach that has been successfully applied in the case of SpMV . Ideally, the dynamic analysis of the matrix should be as inexpensive as possible.…”
Section: Resultsmentioning
confidence: 99%
“…This is a typical auto-tuning approach that has been successfully applied in the case of SpMV. [15][16][17][18][19][20] Ideally, the dynamic analysis of the matrix should be as inexpensive as possible. In the case of CSRLenGoto, one can quickly check whether the given matrix falls into the category of "short rows" or "long rows".…”
Section: Resultsmentioning
confidence: 99%
“…Loop-optimizations unrolling, 9,17,23,24,29,50,84,90 collapsing, 4,6,7,13,20,21,44,54 splitting 22,28 Blocking (tiling) in cache, 14,15,18,[20][21][22]27,39,44,52,54,69 registers 68,69 Compile-time optimizations using pre-computed values, 35,52 specifying array and loop bounds at compile time 6,54 Compute-related optimizations Reusing intermediate variables, 22,35 using conflict-detection instruction of AVX-512, 52,85 performing redundant computation to avoid data-communication or atomic operations 52,82 Array transpose 6, 79…”
Section: Ta B L E 3 Optimization Strategiesmentioning
confidence: 99%
“…Data layouts SoA, 20,22,23,28,30,36,50,79,82,90 AoS, 9 AoSoA 57,90 Data alignment 6,9,14,18,20,24,44,45,52,53,66,79,84,90 Padding 4,7,9,20,24,44,52,53,79,82,91 Dependency disambiguation 15,28,36,82,91 Prefetching Software, 4,7,9,14,17,22,23,40,41,50,…”
Section: Ta B L E 3 Optimization Strategiesmentioning
confidence: 99%
See 1 more Smart Citation