2013
DOI: 10.21236/ada580140
|View full text |Cite
|
Sign up to set email alerts
|

Communication Optimal Parallel Multiplication of Sparse Random Matrices

Abstract: Parallel algorithms for sparse matrix-matrix multiplication typically spend most of their time on inter-processor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. Thus, sparse matrix multiplication algorithms must minimize communication costs in order to scale to large processor counts.In this paper, we consider multiplying sparse matrices corresponding to Erdős-Rényi random graphs on distributed-memory parallel machines. We prove a ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
55
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(56 citation statements)
references
References 9 publications
(11 reference statements)
1
55
0
Order By: Relevance
“…We omit the proof: see Ballard (2013, §4.2.2.4) for full details. This bound is attainable in the sequential case by the algorithm presented in Ballard et al (2013c). See Section 3.3.4 for further discussion of symmetric indefinite algorithms.…”
Section: Lt L T Factorizationmentioning
confidence: 96%
See 2 more Smart Citations
“…We omit the proof: see Ballard (2013, §4.2.2.4) for full details. This bound is attainable in the sequential case by the algorithm presented in Ballard et al (2013c). See Section 3.3.4 for further discussion of symmetric indefinite algorithms.…”
Section: Lt L T Factorizationmentioning
confidence: 96%
“…For algorithms attaining this bound in the dense case, see Section 3.3.1. For further discussion of this bound in the sparse case, see Ballard et al (2013c).…”
Section: Corollary 28 the Bandwidth Cost Lower Bound For Classical mentioning
confidence: 99%
See 1 more Smart Citation
“…Because distribution patterns of the nonzero entries in the both input sparse matrices are very diverse (consider plots of the matrices in the Table I), input space-based data decomposition [17], [9] normally does not bring efficient load balancing. One exception is that computing the SpGEMM for huge sparse matrices on large scale distributed memory systems, 2D and 3D decomposition on input space methods demonstrated good load balancing and scalability by utilizing efficient communication strategies [29], [30], [2]. However, in this paper we mainly consider load balancing for fine-grained parallelism in the GPU shared memory architectures.…”
Section: Load Balancingmentioning
confidence: 99%
“…A very few classical algorithms describe the communication cost of sparse matrix-matrix multiplication. A unified communication analysis of existing and new algorithms as well as an optimal lower bound for communication cost of two new parallel algorithms are given in [9]. In this paper, optimal communication costs of three 1D algorithms such as Naïve Block Row [8], Improved Block Row [23] and Outer Product [24] are outlined in terms of bandwidth costs and latency costs.…”
Section: Related Workmentioning
confidence: 99%