Proceedings of the 2009 International Symposium on Symbolic and Algebraic Computation 2009
DOI: 10.1145/1576702.1576713
|View full text |Cite
|
Sign up to set email alerts
|

Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm

Abstract: We propose several new schedules for Strassen-Winograd's matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices, or by using a first recursive level of classical multiplication. In particular, we show two fully in-place schedules: one having the same number of operations, if the input matrices can be overwritten; the other one, slightly increasing the constant of the leading term of th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
25
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
4
2
2

Relationship

4
4

Authors

Journals

citations
Cited by 31 publications
(27 citation statements)
references
References 14 publications
1
25
0
Order By: Relevance
“…In this section, we provide schedules for Bini's approximate multiplication, in a similar fashion to [BDPZ09]. These schedules are then implemented.…”
Section: Memory Usage and Schedulingmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we provide schedules for Bini's approximate multiplication, in a similar fashion to [BDPZ09]. These schedules are then implemented.…”
Section: Memory Usage and Schedulingmentioning
confidence: 99%
“…This requirement is smaller than Strassen-Winograd's (cf. [BDPZ09]) where X is of size m /2 × max ( k /2, n /2) and Y has size k /2 × n /2. For instance, for m = n = k, we have 5 12 m 2 for Bini's and 1 2 m 2 for Strassen-Winograd's.…”
Section: Lemma 1 the Extra Memory Used For One Level Of Recursion Inmentioning
confidence: 99%
“…Even if the asymptotically fastest algorithms [6,22] are not practicable, matrix multiplication happens to be also the most efficient building block in practice [7,17], thus making the reduction trees (or DAGs) from complexity theory still highly relevant in practice. The development of efficient exact linear algebra software then consists in making these reductions effective: fine tuning of the building block, taking the best advantage of the available computer arithmetic [10,2,1,9,19] and minimizing the memory footprint [3]; improving existing reductions in the leading constant of their time and space complexities [20].…”
Section: Reductions To Building Blocksmentioning
confidence: 99%
“…The routine fgemm in Fflas uses by default the classic schedules for the multiplication and the product with accumulation (cf. [5]), but we also implement the low memory routines therein. The new algorithms are competitive and can reach sizes that were limiting.…”
Section: New Algorithms For the Mul Solutionmentioning
confidence: 99%