2005
DOI: 10.2172/891708
|View full text |Cite
|
Sign up to set email alerts
|

Fast sparse matrix-vector multiplication by exploiting variable block structure

Abstract: We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A 1 + A 2 + . . . + A s , where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
61
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(61 citation statements)
references
References 21 publications
0
61
0
Order By: Relevance
“…At the low-level, future work will investigate rectangular register blocks as mentioned in Section V, and using variable sized blocks via splitting [31]. A practical approach to address variable blocking that exploits the recursive structure of CSB is as follows.…”
Section: Discussionmentioning
confidence: 99%
“…At the low-level, future work will investigate rectangular register blocks as mentioned in Section V, and using variable sized blocks via splitting [31]. A practical approach to address variable blocking that exploits the recursive structure of CSB is as follows.…”
Section: Discussionmentioning
confidence: 99%
“…For instance, matrices with banded structure or where nonzeros are grouped in (almost) dense blocks occur often in practice. This insight can be used to create more optimised block-based storage formats [13], where only the position of nonzero blocks needs to be stored. This reduces the amount of metadata to store and increases the computational efficiency due to the dense local structure.…”
Section: Background and Related Workmentioning
confidence: 99%
“…BCSR blocks are row-and column-aligned at r and c elements boundaries, respectively. Although this alignment may seem restrictive and, generally, lead to more padding [14], it can greatly favor vectorization as it will be explained in the following. Figure 2 shows the SpMV kernel for BCSR with 2 × 2 blocks.…”
Section: Storage Formats For Sparse Matricesmentioning
confidence: 99%
“…Consequently, proper alignment of data should be considered as a prerequisite for performance when trying to vectorize SpMV. For this reason, BCSR compared to Unaligned BCSR (UBCSR) [14] is a more appropriate data structure for vectorization, since the logically aligned blocks of BCSR can be easily aligned in memory without any extra padding. Another not so obvious implication of the alignment requirements is that blocks not having at least one even dimension, such as the 3×1 and 3×3 blocks, cannot be efficiently vectorized, since they cannot be naturally aligned without effectively collapsing to larger blocks.…”
Section: Architectural Implications On the Execution Of Blocked And Vmentioning
confidence: 99%