2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2017
DOI: 10.1109/ipdps.2017.56
|View full text |Cite
|
Sign up to set email alerts
|

Generating Families of Practical Fast Matrix Multiplication Algorithms

Abstract: Matrix multiplication (GEMM) is a core operation to numerous scientific applications. Traditional implementations of Strassen-like fast matrix multiplication (FMM) algorithms often do not perform well except for very large matrix sizes, due to the increased cost of memory movement, which is particularly noticeable for non-square matrices. Such implementations also require considerable workspace and modifications to the standard BLAS interface. We propose a code generator framework to automatically implement a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
25
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 28 publications
(26 citation statements)
references
References 23 publications
(41 reference statements)
1
25
0
Order By: Relevance
“…• A number of recent papers explore practical implementations of Strassen-like fast matrix multiplications [9,8]. How to extend fast matrix multiplication with different partition block sizes for tensor contraction is an open question.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…• A number of recent papers explore practical implementations of Strassen-like fast matrix multiplications [9,8]. How to extend fast matrix multiplication with different partition block sizes for tensor contraction is an open question.…”
Section: Discussionmentioning
confidence: 99%
“…As demonstrated in [7], this approach makes Strassen practical for smaller matrices and matrices of special shape (importantly, for rank-k updates, where N p is relatively small comparing to N i and N j ). This research is pushed further [8] by revealing that Strassen performs relatively better than most other Strassen-like FMM algorithms with one or two levels of recursions, when modeled as well as in practice. For this reason, we do not extend those FMM algorithms to TC in this paper, although it may be worthwhile in future work to pursue certain of these algorithms for highly non-square tensor contraction shapes.…”
Section: High-performance Strassenmentioning
confidence: 99%
See 2 more Smart Citations
“…Ballard, Demmel, Holtz, Lipshitz, and Schwartz [15], Ballard, Demmel, Holtz, and Schwartz [16], and Lipshitz, Ballard, Demmel, and Schwartz [77]. Recent engineering work includes Benson and Ballard [23] and Huang, Rice, Matthews, and van de Geijn [65]. Our work differs from these works in that we seek a self-contained proof-of-concept demonstration requiring good-performance finite-field matrix multiplication on GPUs, but we do not necessarily seek the most optimized possible implementation; such optimizations are left for future work.…”
Section: Fast Matrix Multiplicationmentioning
confidence: 99%