Communication Optimal Parallel Multiplication of Sparse Random Matrices

Ballard, Grey; Buluç, Aydın; Demmel, James; Grigori, Laura; Lipshitz, Benjamin; Schwartz, Oded; Toledo, Sivan

doi:10.21236/ada580140

Cited by 41 publications

(56 citation statements)

References 9 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We omit the proof: see Ballard (2013, §4.2.2.4) for full details. This bound is attainable in the sequential case by the algorithm presented in Ballard et al (2013c). See Section 3.3.4 for further discussion of symmetric indefinite algorithms.…”

Section: Lt L T Factorizationmentioning

confidence: 96%

“…For algorithms attaining this bound in the dense case, see Section 3.3.1. For further discussion of this bound in the sparse case, see Ballard et al (2013c).…”

Section: Corollary 28 the Bandwidth Cost Lower Bound For Classical mentioning

confidence: 99%

“…In this case, the lower bounds of Section 2 are unattainable; however, similar proof techniques can obtain tighter lower bounds in the parallel case that match optimal algorithms. See Ballard et al (2013c) for details.…”

Section: Direct Computations With Sparse Matricesmentioning

confidence: 99%

See 2 more Smart Citations

Communication lower bounds and optimal algorithms for numerical linear algebra

Ballard¹,

Carson²,

Demmel³

et al. 2014

Acta Numerica

132

View full text Add to dashboard Cite

The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summarize recent progress in three aspects of this problem. First we describe lower bounds on communication. Some of these generalize known lower bounds for dense classical (O(n3)) matrix multiplication to all direct methods of linear algebra, to sequential and parallel algorithms, and to dense and sparse matrices. We also present lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices. Second, we compare these lower bounds to widely used versions of these algorithms, and note that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identify or invent new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrate large speed-ups in theory and practice.

show abstract

Section: Lt L T Factorizationmentioning

confidence: 96%

“…For algorithms attaining this bound in the dense case, see Section 3.3.1. For further discussion of this bound in the sparse case, see Ballard et al (2013c).…”

Section: Corollary 28 the Bandwidth Cost Lower Bound For Classical mentioning

confidence: 99%

See 1 more Smart Citation

Communication lower bounds and optimal algorithms for numerical linear algebra

Ballard¹,

Carson²,

Demmel³

et al. 2014

Acta Numerica

132

View full text Add to dashboard Cite

show abstract

“…Because distribution patterns of the nonzero entries in the both input sparse matrices are very diverse (consider plots of the matrices in the Table I), input space-based data decomposition [17], [9] normally does not bring efficient load balancing. One exception is that computing the SpGEMM for huge sparse matrices on large scale distributed memory systems, 2D and 3D decomposition on input space methods demonstrated good load balancing and scalability by utilizing efficient communication strategies [29], [30], [2]. However, in this paper we mainly consider load balancing for fine-grained parallelism in the GPU shared memory architectures.…”

Section: Load Balancingmentioning

confidence: 99%

An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

Liu

Vinter

2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method, breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM algorithm has to handle extra irregularity from three aspects: (1) the number of the nonzero entries in the result sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the result sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices. Recent work on GPU SpGEMM has demonstrated rather good both time and space complexity, but works best for fairly regular matrices.In this work we present a GPU SpGEMM algorithm that particularly focuses on the above three problems. Memory pre-allocation for the result matrix is organized by a hybrid method that saves a large amount of global memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of the necessary arithmetic operations on the nonzero entries and is guaranteed in all stages.Compared with the state-of-the-art GPU SpGEMM methods in the CUSPARSE library and the CUSP library and the latest CPU SpGEMM method in the Intel Math Kernel Library, our approach delivers excellent absolute performance and relative speedups on a benchmark suite composed of 23 matrices with diverse sparsity structures.

show abstract

“…A very few classical algorithms describe the communication cost of sparse matrix-matrix multiplication. A unified communication analysis of existing and new algorithms as well as an optimal lower bound for communication cost of two new parallel algorithms are given in [9]. In this paper, optimal communication costs of three 1D algorithms such as Naïve Block Row [8], Improved Block Row [23] and Outer Product [24] are outlined in terms of bandwidth costs and latency costs.…”

Section: Related Workmentioning

confidence: 99%

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Ahmed

Houser

Hoque

et al. 2017

International Journal of Grid and High Performance Computing

View full text Add to dashboard Cite

Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., where is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes run-time complexity for accumulating the results.

show abstract

Communication Optimal Parallel Multiplication of Sparse Random Matrices

Cited by 41 publications

References 9 publications

Communication lower bounds and optimal algorithms for numerical linear algebra

Communication lower bounds and optimal algorithms for numerical linear algebra

An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Contact Info

Product

Resources

About