Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2015
DOI: 10.1145/2688500.2688513
|View full text |Cite
|
Sign up to set email alerts
|

A framework for practical parallel fast matrix multiplication

Abstract: Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and Strassen's fast algorithm on modest problem sizes and shapes. Furthermore, we show that the best choice of fast algorithm depends not only on the size of the matrices but also the shape. We develop a code generation tool to automatically implement multiple sequential and share… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 27 publications
(25 citation statements)
references
References 33 publications
0
25
0
Order By: Relevance
“…In our experiments, we use the effective Gfops (Giga field operations per second) metric, also used in [12,25,2] defined as Gfops = # of field ops using classic matrix product time .…”
Section: Methodology Of Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…In our experiments, we use the effective Gfops (Giga field operations per second) metric, also used in [12,25,2] defined as Gfops = # of field ops using classic matrix product time .…”
Section: Methodology Of Experimentsmentioning
confidence: 99%
“…In particular, numerical linear algebra based on Strassen's algorithm (if numerical stability issues have been considered acceptable) should clearly benefit from most of its results. Related work on the parallelization of the sub-cubic numerical linear algebra include [1,24,6,25,2].…”
Section: Introductionmentioning
confidence: 99%
“…Both of the 1 In this paper, we distinguish the sorting network and the merging network. 2 We use the Integer datatype as the representative of the one-word type, and the Double datatype for the two-word type.…”
Section: Intel Mic Vector Architecturementioning
confidence: 99%
“…Mint and Physis in [26,16] can generate e↵ective GPU codes for stencil computations. Benson et al [2] provides a code generation tool to automatically implement various matrix multiplication algorithms. To facilitate the utilization of the intra-core resources, Huo et al [12] presents a system with runtime SIMD parallelization with override operators and functions.…”
Section: Related Workmentioning
confidence: 99%
“…From a practical perspective, it is unlikely that the techniques for obtaining the best upper bounds on the exponent can be translated to practical algorithms that will execute faster than the classical one for reasonably sized matrices. In this paper, we are interested in the numerical stability of practical algorithms that have been demonstrated to outperform the classical algorithm (as well as Strassen's in some instances) on modern hardware [3].…”
mentioning
confidence: 99%