2005
DOI: 10.1007/11557654_91
|View full text |Cite
|
Sign up to set email alerts
|

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

Abstract: We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A 1 + A 2 + . . . + A s , where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
47
0
2

Year Published

2008
2008
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 88 publications
(49 citation statements)
references
References 14 publications
0
47
0
2
Order By: Relevance
“…They experiment on a K6, a Power3, and an Itanium II processor for a suite of 20 sparse matrices and validate the accuracy of the proposed performance model. Vuduc et al [26] extend the notion of blocking in order to exploit variable block shapes by decomposing the original matrix to a proper sum of submatrices storing each submatrix in a variation of the BCSR format. Their approach is tested on the Ultra2i, Pentium III-M, Power4, and Itanium II processors for a suite of 10 FEM matrices that contain dense subblocks.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…They experiment on a K6, a Power3, and an Itanium II processor for a suite of 20 sparse matrices and validate the accuracy of the proposed performance model. Vuduc et al [26] extend the notion of blocking in order to exploit variable block shapes by decomposing the original matrix to a proper sum of submatrices storing each submatrix in a variation of the BCSR format. Their approach is tested on the Ultra2i, Pentium III-M, Power4, and Itanium II processors for a suite of 10 FEM matrices that contain dense subblocks.…”
Section: Related Workmentioning
confidence: 99%
“…For example, blocking implemented with the use of the Block Compressed Storage Row (BCSR) format was proposed by Im and Yelick [11] as a transformation to tame irregular accesses on the input vector and exploit its inherent reuse, like in dense matrix optimizations. Onedimensional blocking is also proposed by Pinar and Heath [20] in order to reduce indirect memory references, while quite recently, Buttari et al [5] and Vuduc and Moon [26] accentuate the merit of blocking (the latter with variable-sized blocks) as a transformation to reduce indirect references and enable register level blocking and unrolling. However, it is not clarified if the benefits of blocking can be actually attributed to better cache utilization, memory access reduction, or ILP improvement.…”
Section: Introductionmentioning
confidence: 97%
See 2 more Smart Citations
“…However, as one of the "seven dwarfs" [1], SpMV is notorious for sustaining low fractions (less than 10% [2]) of the peak performance on the general purpose processors, mostly due to the inefficient use of the memory bandwidth. It results from the mismatches between the memory access patterns and the compression schemes of the sparse matrix.…”
Section: Introductionmentioning
confidence: 99%