Proceedings Supercomputing '92
DOI: 10.1109/superc.1992.236712
|View full text |Cite
|
Sign up to set email alerts
|

A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(30 citation statements)
references
References 6 publications
0
30
0
Order By: Relevance
“…There have been proposed two strategies to avoid padding in the literature: (a) decompose the original matrix into two or more matrices, where each matrix contains dense subblocks of some common pattern (e.g., rectangular, diagonal blocks, etc. ), while the last matrix contains the remainder elements in a standard sparse storage format [1], and (b) use variablesized blocks [12], [13]. In the following, we will present each blocking method in more detail.…”
Section: An Overview Of Blocking Storage Formatsmentioning
confidence: 99%
See 2 more Smart Citations
“…There have been proposed two strategies to avoid padding in the literature: (a) decompose the original matrix into two or more matrices, where each matrix contains dense subblocks of some common pattern (e.g., rectangular, diagonal blocks, etc. ), while the last matrix contains the remainder elements in a standard sparse storage format [1], and (b) use variablesized blocks [12], [13]. In the following, we will present each blocking method in more detail.…”
Section: An Overview Of Blocking Storage Formatsmentioning
confidence: 99%
“…A version of this format has been initially proposed in [1] as part of a decomposed method, which extracted common dense subblocks from the input matrix. A similar format, called RSDIAG, is also presented in [15], but it maintains an additional structure that stores the total number of diagonals in each segment.…”
Section: A Blocking With Paddingmentioning
confidence: 99%
See 1 more Smart Citation
“…There exist several methods in the literature proposed to improve the cache locality for the SpMxV operations by reordering the rows and/or columns of the matrix by using graph/hypergraph partitioning [6], [7], [8], [9], [10] and other techniques [11], [12], [13], [14]. The recommendation algorithm used in theadvisor is direction aware.…”
Section: Introductionmentioning
confidence: 99%
“…Performing this operation using the CSR format is trivial, but it was observed that the maximum performance in Mflop/s sustained by a naïve implementation can reach only a small part of the machine peak performance [14]. As a means of transcending this limit, several optimization techniques have been proposed, such as reordering [24,28,29,32], data compression [22,33], blocking [1,15,23,24,28,29,31], vectorization [4,11], loop unrolling [32] and jamming [21], and software prefetching [29]. Lately, the dissemination of multi-core computers have promoted multi-threading as an important tuning technique, which can be further combined with purely sequential methods.…”
Section: Introductionmentioning
confidence: 99%