Parallel Computing 2000
DOI: 10.1142/9781848160170_0036
|View full text |Cite
|
Sign up to set email alerts
|

Towards a Fast Parallel Sparse Matrix-Vector Multiplication

Abstract: The sparse matrix-vector product is an important computational kernel that runs ineffectively on many computers with super-scalar RISC processors. In this paper we analyse the performance of the sparse matrix-vector product with symmetric matrices originating from the FEM and describe techniques that lead to a fast implementation. It is shown how these optimisations can be incorporated into an efficient parallel implementation using messagepassing. We conduct numerical experiments on many different machines an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
25
0
1

Year Published

2005
2005
2011
2011

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(26 citation statements)
references
References 5 publications
0
25
0
1
Order By: Relevance
“…A number consider techniques that compress the data structure by recognizing patterns in order to eliminate the integer index overhead. These patterns include blocks [10], variable or mixtures of differently-sized blocks [6] diagonals, which may be especially well-suited to machines with SIMD and vector units [19], dense subtriangles arising in sparse triangular solve [22], and symmetry [11], and combinations.…”
Section: Oski Oski-petsc and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A number consider techniques that compress the data structure by recognizing patterns in order to eliminate the integer index overhead. These patterns include blocks [10], variable or mixtures of differently-sized blocks [6] diagonals, which may be especially well-suited to machines with SIMD and vector units [19], dense subtriangles arising in sparse triangular solve [22], and symmetry [11], and combinations.…”
Section: Oski Oski-petsc and Related Workmentioning
confidence: 99%
“…Better low-level tuning of the kind proposed in this paper, even applied to just a CSR SpMV, are also possible. Recent work on low-level tuning of SpMV by unroll-and-jam [12], software pipelining [6], and prefetching [17] influence our work. See [19] for an extensive overview of SPMV optimization techniques.…”
Section: Oski Oski-petsc and Related Workmentioning
confidence: 99%
“…These patterns include blocks [13], variable or mixtures of differently-sized blocks [12] diagonals, which may be especially wellsuited to machines with SIMD and vector units [32,28], general pattern compression [33], value compression [15], and combinations.…”
Section: Related Workmentioning
confidence: 99%
“…Researchers have also examined low-level tuning of SpMV by unroll-and-jam [20], and software pipelining [12], and prefetching [26]. A completely recursive layout for SpMV, motivated by CSB, is recently examined by Martone et al [19].…”
Section: Related Workmentioning
confidence: 99%
“…The inspiration for this study comes from recent work on splitting by Geus and Röllin [11], Pinar and Heath [25], and Toledo [33], and the performance gap we have observed informally [15,37,14]. Geus and Röllin explore up to 3-way splittings for a particular application matrix used in accelerator cavity design, but the splitting terms are still based on row-aligned BCSR format.…”
Section: Related Workmentioning
confidence: 99%