Alex Druinsky scite author profile

Asynchronous methods for solving systems of linear equations have been researched since Chazan and Miranker's pioneering 1969 paper on chaotic relaxation. The underlying idea of asynchronous methods is to avoid processor idle time by allowing the processors to continue to make progress even if not all progress made by other processors has been communicated to them.Historically, the applicability of asynchronous methods for solving linear equations was limited to certain restricted classes of matrices, such as diagonally dominant matrices. Furthermore, analysis of these methods focused on proving convergence in the limit. Comparison of the asynchronous convergence rate with its synchronous counterpart and its scaling with the number of processors were seldom studied, and are still not well understood.In this paper, we propose a randomized shared-memory asynchronous method for general symmetric positive definite matrices. We rigorously analyze the convergence rate and prove that it is linear, and is close to that of the method's synchronous counterpart if the processor count is not excessive relative to the size and sparsity of the matrix. We also present an algorithm for unsymmetric systems and overdetermined least-squares. Our work presents a significant improvement in the applicability of asynchronous linear solvers as well as in their convergence analysis, and suggests randomization as a key paradigm to serve as a foundation for asynchronous methods.

show abstract

Improving the Numerical Stability of Fast Matrix Multiplication

Ballard¹,

Benson²,

Druinsky³

et al. 2016

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few fast algorithms have been efficiently implemented or used in practical applications. However, there exist many practical alternatives to Strassen's algorithm with varying performance and numerical properties. Fast algorithms are known to be numerically stable, but because their error bounds are slightly weaker than the classical algorithm, they are not used even in cases where they provide a performance benefit.We argue in this paper that the numerical sacrifice of fast algorithms, particularly for the typical use cases of practical algorithms, is not prohibitive, and we explore ways to improve the accuracy both theoretically and empirically. The numerical accuracy of fast matrix multiplication depends on properties of the algorithm and of the input matrices, and we consider both contributions independently. We generalize and tighten previous error analyses of fast algorithms and compare their properties. We discuss algorithmic techniques for improving the error guarantees from two perspectives: manipulating the algorithms, and reducing input anomalies by various forms of diagonal scaling. Finally, we benchmark performance and demonstrate our improved numerical accuracy.

show abstract

Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication

Ballard

Druinsky

Knight

et al. 2016

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in a sequential computation. We show that identifying a communication-optimal algorithm for particular input matrices is equivalent to solving a hypergraph partitioning problem. Our approach is sparsity dependent, meaning that we seek the best algorithm for the given input matrices.In addition to our (3D) fine-grained model, we also propose coarse-grained 1D and 2D models that correspond to simpler SpGEMM algorithms. We explore the relations between our models theoretically, and we study their performance experimentally in the context of three applications that use SpGEMM as a key computation. For each application, we find that at least one coarse-grained model is as communication efficient as the fine-grained model. We also observe that different applications have affinities for different algorithms.Our results demonstrate that hypergraphs are an accurate model for reasoning about the communication costs of SpGEMM as well as a practical tool for exploring the SpGEMM algorithm design space.

show abstract

Communication-Avoiding Symmetric-Indefinite Factorization

Ballard¹,

Becker²,

Demmel³

et al. 2014

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Abstract. We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal.

show abstract

Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication

Ballard

Druinsky

Knight

et al. 2015

View full text Add to dashboard Cite

The performance of parallel algorithms for sparse matrixmatrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computation to improve applicationspecific algorithms for multiplying sparse matrices.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alex Druinsky

Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization

Improving the Numerical Stability of Fast Matrix Multiplication

Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication

Communication-Avoiding Symmetric-Indefinite Factorization

Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication

Contact Info

Product

Resources

About