2010
DOI: 10.1007/s00224-010-9285-4
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Abstract: We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O model. In the SpMV, the objective is to compute y = Ax, where A is a sparse matrix and x and y are vectors. We give tight upper and lower bounds on the number of block transfers as a function of the sparsity k, the number of nonzeros in a column of A.Parameter k is a knob that bridges the problems of permuting (k = 1) and dense matrix multiplication (k = N ). When the nonzero elements of A are stored in column-major order,

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
57
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 37 publications
(58 citation statements)
references
References 12 publications
(10 reference statements)
1
57
0
Order By: Relevance
“…Then I(A, x) = 1, 2, 3, 4, 9, 10, 11, 12, 5, 6, 7, 8, 13, 14, 15, 16 is a run with the stripes I(A, x) = (1, 2, 3, 4), (9,10,11,12), (5,6,7,8), (13,14,15,16).…”
Section: Definition 2 (Runs) a Sequence Of Memory Locations Is Callementioning
confidence: 99%
See 1 more Smart Citation
“…Then I(A, x) = 1, 2, 3, 4, 9, 10, 11, 12, 5, 6, 7, 8, 13, 14, 15, 16 is a run with the stripes I(A, x) = (1, 2, 3, 4), (9,10,11,12), (5,6,7,8), (13,14,15,16).…”
Section: Definition 2 (Runs) a Sequence Of Memory Locations Is Callementioning
confidence: 99%
“…In the first phase we read the contents of run (1, 2, 3, 4) and write it into run (5,6,7,8). In the second phase the contents of run (9,10,11,12) is written into run (13,14,15,16). Each run here consists of a single stripe and we read from exactly one run at each phase.…”
Section: (And the Runs Are Defined At The Beginning Of A Phase); And mentioning
confidence: 99%
“…The second algorithmic direction strives to achieve optimal theoretical I/O complexity by using cacheoblivious algorithms [3]. From a high-level view, Bender's algorithm first generates all the intermediate triples of the output vector y, possibly with repeating indices.…”
Section: Related Workmentioning
confidence: 99%
“…Bender et al [5] extended the sequential communication lower bounds introduced in [14] to sparse matrix vector multiplication. This lower bound is relevant to our analysis of Krylov subspace methods, which essentially perform repeated sparse matrix vector multiplications.…”
Section: Previous Workmentioning
confidence: 99%
“…This lower bound is relevant to our analysis of Krylov subspace methods, which essentially perform repeated sparse matrix vector multiplications. However, [5] used a sequential memory hierarchy model and established bounds in terms of memory size and track (cacheline) size, while we focus on interprocessor communication.…”
Section: Previous Workmentioning
confidence: 99%