2018
DOI: 10.1145/3157733
|View full text |Cite
|
Sign up to set email alerts
|

Design of a High-Performance GEMM-like Tensor–Tensor Multiplication

Abstract: We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach for dense tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the identification of three index sets, involved in the tensor contraction, which enable us to systematically reduce an arbitrary tensor contraction to loops around a highly tuned "macro-kernel". This macro-kernel operates on suitably prepared ("packed") sub-tensors that reside in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
64
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 63 publications
(65 citation statements)
references
References 44 publications
1
64
0
Order By: Relevance
“…Although rearranging compound expressions may drastically reduce the required computational work in some cases, the efficient implementation of primitive tensor operations is necessary for the effective use of hardware. Traditionally, BLAS (Basic Linear Algebra Subroutines) implementations provide optimized dense linear algebra operations; Springer and Bientinesi [51] present efficient strategies for generalising to tensor-tensor multiplication. However, these routines are optimized for large tensors and matrices.…”
Section: Related Workmentioning
confidence: 99%
“…Although rearranging compound expressions may drastically reduce the required computational work in some cases, the efficient implementation of primitive tensor operations is necessary for the effective use of hardware. Traditionally, BLAS (Basic Linear Algebra Subroutines) implementations provide optimized dense linear algebra operations; Springer and Bientinesi [51] present efficient strategies for generalising to tensor-tensor multiplication. However, these routines are optimized for large tensors and matrices.…”
Section: Related Workmentioning
confidence: 99%
“…An early effort for dense higher-order tensor algebra was the Tensor Contraction Engine [Auer et al 2006]. libtensor [Epifanovsky et al 2013], CTF [Solomonik et al 2014], and GETT [Springer and Bientinesi 2016] are examples of systems and techniques that transform tensor contractions into dense matrix multiplications by transposing tensor operands. TBLIS [Matthews 2017] and InTensLi [Li et al 2015] avoid explicit transpositions by computing tensor contractions in-place.…”
Section: Tensor Storage Abstractions and Code Generationmentioning
confidence: 99%
“…We will leverage this observation in our distributed memory experiment. Figure 6: Performance for representative user cases of benchmark from [6]. TC is identified by the index string, with the tensor index bundle of each tensor in the order C-A-B, e.g.…”
Section: Performance Modelmentioning
confidence: 99%
“…Standing on the shoulders of giants. This paper builds upon a number of recent developments: The GotoBLAS algorithm for matrix multiplication (GEMM) [1] that underlies the currently fastest implementations of GEMM for CPUs; The refactoring of the GotoBLAS algorithm as part of the BLAS-like Library Instantiation Software (BLIS) [2,3], which exposes primitives for implementing BLAS-like operations; The systematic parallelization of the loops that BLIS exposes so that high-performance can be flexibly attained on multicore and many-core architectures [4]; The casting of tensor contraction (TC) in terms of the BLIS primitives [5,6] without requiring the transposition (permutation) used by traditional implementations; The practical high-performance implementation of the classical Strassen's algorithm (Strassen) [7] in terms of variants of the BLIS primitives; and the extension of this implementation [8] to a family of Strassen-like algorithms (Fast Matrix Multiplication algorithms) [9]. Together, these results facilitate what we believe to be the first extension of Strassen's algorithm to TC.…”
Section: Introductionmentioning
confidence: 99%