2012
DOI: 10.21236/ada561679
|View full text |Cite
|
Sign up to set email alerts
|

Communication Avoiding and Overlapping for Numerical Linear Algebra

Abstract: To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix multiplication algorithms (Cannon and SU… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 12 publications
0
16
0
Order By: Relevance
“…That is, the product of the number of words and the number of messages sent is Θ(n 2 ); this trade-off is shown to be necessary in . We mention here only one of many speed-ups: up to 2.1× of 2.5D LU on 64K of an IBM BG/P machine compared to previous parallel LU factorization (Georganas et al 2012). A similar approach, applied to the direct N -body problem, leads to speed-ups of up to 11.8× on the 32K core IBM BG/P, compared to similarly tuned 2D algorithms (Driscoll et al 2013).…”
Section: Parallel Casementioning
confidence: 94%
See 1 more Smart Citation
“…That is, the product of the number of words and the number of messages sent is Θ(n 2 ); this trade-off is shown to be necessary in . We mention here only one of many speed-ups: up to 2.1× of 2.5D LU on 64K of an IBM BG/P machine compared to previous parallel LU factorization (Georganas et al 2012). A similar approach, applied to the direct N -body problem, leads to speed-ups of up to 11.8× on the 32K core IBM BG/P, compared to similarly tuned 2D algorithms (Driscoll et al 2013).…”
Section: Parallel Casementioning
confidence: 94%
“…This algorithm can be applied to symmetric positive definite matrices, though it uses explicit triangular matrix inversion and multiplication (ignoring stability issues) and also ignores symmetry. Georganas et al (2012) extend the ideas of Solomonik and Demmel (2011) to the symmetric positive definite case, saving arithmetic by exploiting symmetry and maintaining stability by using triangular solves. Lipshitz (2013) provides a similar algorithm for Cholesky factorization, along with a recursive algorithm for triangular solve, that also maintains symmetry and stability.…”
Section: Parallel Casementioning
confidence: 99%
“…In this work, we broadly focus on the case of large-scale dense linear algebra. This domain has a rich literature of parallel communication-avoiding algorithms and existing high performance implementations [2,5,6,19].…”
Section: Linear Algebra Algorithmsmentioning
confidence: 99%
“…Overlapping computation and communication has long been considered an avenue for optimizing parallel performance [33]. Benefits of the overlap have been explored for different types of algorithms [47] and on different architectures [102]. The co-processor mode of operation of Blue Gene/L [3] paired an application processor with another processor dedicated to handling its communication tasks.…”
Section: Lazy Evaluation and Its Use For Optimizing Parallel Performancementioning
confidence: 99%