Minimizing communication in sparse matrix solvers

Mohiyuddin, Marghoob; Hoemmen, Mark Frederick; Demmel, James; Yelick, Katherine

doi:10.1145/1654059.1654096

Cited by 86 publications

(88 citation statements)

References 17 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When it comes to subspace recycling methods or block iterative methods, Gram-Schmidt schemes are often used to perform this [37]. Belos uses by default the Iterated Modified Gram-Schmidt method, but it is also possible to switch to the TSQR method, first studied in the context of CA-GMRES [47]. In our implementation, we propose to use the CholQR method [48] since its efficiency has already been proved-once again in the context of CA-GMRES [49].…”

Section: A Generalized Conjugate Residual Methods With Inner Orthogonmentioning

confidence: 99%

Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Jolivet¹,

Tournier

2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Section: A Generalized Conjugate Residual Methods With Inner Orthogonmentioning

confidence: 99%

Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Jolivet¹,

Tournier

2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

“…Since an SpMV must read a matrix entry from memory for every two useful floating-point operations, making it a highly memory-bound operation, Demmel et al have proposed communicationavoiding algorithms that improve performance by trading redundant computation for memory traffic [1]. In communication-avoiding KSMs, SpMV is replaced by the matrix powers kernel, which computes Ax, A 2 x, .…”

Section: Introductionmentioning

confidence: 99%

Auto-tuning the Matrix Powers Kernel with SEJITS

Morlan

Kamil

Fox

2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The matrix powers kernel, used in communication-avoiding Krylov subspace methods, requires runtime auto-tuning for best performance. We demonstrate how the SEJITS (Selective Embedded Just-InTime Specialization) approach can be used to deliver a high-performance and performance-portable implementation of the matrix powers kernel to application authors, while separating their high-level concerns from those of auto-tuner implementers involving low-level optimizations. The benefits of delivering this kernel in the form of a specializer, rather than a traditional library, are discussed. Performance of the matrix powers kernel specializer is evaluated in the context of a communication-avoiding conjugate gradient (CA-CG) solver, which compares favorably to traditional CG.

show abstract

“…To avoid memory bottlenecks, these algorithms are modified by performance programmers to improve the algorithm's temporal and spatial locality. These optimization techniques include irregular cache blocking [1], full sparse tiling [2], and communication avoiding algorithms [3].…”

Section: Introductionmentioning

confidence: 99%

Executing Optimized Irregular Applications Using Task Graphs within Existing Parallel Models

Krieger

Strout

Roelofs

et al. 2012

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

View full text Add to dashboard Cite

Abstract-Many sparse or irregular scientific computations are memory bound and benefit from locality improving optimizations such as blocking or tiling. These optimizations result in asynchronous parallelism that can be represented by arbitrary task graphs. Unfortunately, most popular parallel programming models with the exception of Threading Building Blocks (TBB) do not directly execute arbitrary task graphs. In this paper, we compare the programming and execution of arbitrary task graphs qualitatively and quantitatively in TBB, the OpenMP doall model, the OpenMP 3.0 task model, and Cilk Plus. We present performance and scalability results for 8 and 40 core shared memory systems on a sparse matrix iterative solver and a molecular dynamics benchmark.

show abstract

Minimizing communication in sparse matrix solvers

Cited by 86 publications

References 17 publications

Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Auto-tuning the Matrix Powers Kernel with SEJITS

Executing Optimized Irregular Applications Using Task Graphs within Existing Parallel Models

Contact Info

Product

Resources

About