2016
DOI: 10.1109/tpds.2015.2453970
|View full text |Cite
|
Sign up to set email alerts
|

Locality-Aware Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication on Many-Core Processors

Abstract: Sparse matrix-vector and matrix-transpose-vector multiplication (SpMM T V) repeatedly performed as z A T x and y A z (or y A w) for the same sparse matrix A is a kernel operation widely used in various iterative solvers. One important optimization for serial SpMM T V is reusing A-matrix nonzeros, which halves the memory bandwidth requirement. However, thread-level parallelization of SpMM T V that reuses A-matrix nonzeros necessitates concurrent writes to the same output-vector entries. These concurrent writes … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 29 publications
0
7
0
Order By: Relevance
“…This pair of operations cannot be calculated simultaneously because they are data dependent, whereas SpMM T V consists of two independent operations. The paper 13 acknowledges that the SpMM T V operations z ← A T x and y ← Aw are used in certain algorithms but does not investigate SpMM T V in detail.…”
Section: Related Workmentioning
confidence: 99%
“…This pair of operations cannot be calculated simultaneously because they are data dependent, whereas SpMM T V consists of two independent operations. The paper 13 acknowledges that the SpMM T V operations z ← A T x and y ← Aw are used in certain algorithms but does not investigate SpMM T V in detail.…”
Section: Related Workmentioning
confidence: 99%
“…RCM is used in [31] for bandwidth reduction of sparse matrix A on the Xeon Phi coprocessor. For sparse matrix-vector and matrix-transpose-vector multiplication (SpMMTV), which contains two consecutive SpMVs, Karsavuran et al [32] utilize hypergraph models for exploiting temporal locality on Xeon Phi.…”
Section: Related Workmentioning
confidence: 99%
“…However, the experiments by Beamer, et al have demonstrated that cache blocking is not effective for large scalefree graphs [3]. Others have proposed vertex reordering techniques based on hypergraph partitioning to improve temporal and spatial locality [41], [42]. These techniques require expensive preprocessing operations.…”
Section: Related Workmentioning
confidence: 99%