2014
DOI: 10.1017/s0962492914000038
|View full text |Cite
|
Sign up to set email alerts
|

Communication lower bounds and optimal algorithms for numerical linear algebra

Abstract: The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck. This motivates us to seek algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. In this paper we summa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
132
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 99 publications
(136 citation statements)
references
References 172 publications
(251 reference statements)
0
132
0
Order By: Relevance
“…The strategy proposed in Eqn. (5) for selecting the nodes to keep the redundant copies of p (j−1) I i and p (j) I i is a reasonably good heuristic for minimizing communication overheads during SpMV if we assume that the entries of the system matrix A are mostly clustered around the diagonal (since it then is likely that there are some elements which have to be sent anyway from node i to node d ik and, thus, there is no extra latency for establishing a new connection; see Sec. 5 for a more detailed discussion).…”
Section: Tolerating Multiple Node Failuresmentioning
confidence: 99%
“…The strategy proposed in Eqn. (5) for selecting the nodes to keep the redundant copies of p (j−1) I i and p (j) I i is a reasonably good heuristic for minimizing communication overheads during SpMV if we assume that the entries of the system matrix A are mostly clustered around the diagonal (since it then is likely that there are some elements which have to be sent anyway from node i to node d ik and, thus, there is no extra latency for establishing a new connection; see Sec. 5 for a more detailed discussion).…”
Section: Tolerating Multiple Node Failuresmentioning
confidence: 99%
“…For example, here is a citation from [5] relevant to our study of pivoting: "The traditional metric for the efficiency of a numerical algorithm has been the number of arithmetic operations it performs. Technological trends have long been reducing the time to perform an arithmetic operation, so it is no longer the bottleneck in many algorithms; rather, communication, or moving data, is the bottleneck".…”
Section: Numerical Gaussian Elimination With No Pivoting and Block Gamentioning
confidence: 99%
“…Serial and parallel variants of the matrix powers kernel, for both structured and general sparse matrices, are described in [31] and [2], which summarize most of [14] and elaborate on the implementation in [32]. Within [31], we refer the reader to the complexity analysis in Tables 2.3-4, the performance modeling in section 2.6, and the performance results in section 2.10.3 and section 2.11.3, which demonstrate that this optimization leads to speedups in practice.…”
Section: Communication-avoiding Kernelsmentioning
confidence: 99%