Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2015
DOI: 10.1145/2807591.2807671
|View full text |Cite
|
Sign up to set email alerts
|

An input-adaptive and in-place approach to dense tensor-times-matrix multiply

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
47
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 57 publications
(48 citation statements)
references
References 35 publications
1
47
0
Order By: Relevance
“…From (2.12) and Figure 2.10, the equivalent matrix form is C (n) = BA (n) , which allows us to employ established fast matrix-by-vector and matrix-by-matrix multiplications when dealing with very large-scale tensors. Efficient and optimized algorithms for TTM are, however, still emerging [11,12,131].…”
Section: Symmetric Tensor Decompositionmentioning
confidence: 99%
“…From (2.12) and Figure 2.10, the equivalent matrix form is C (n) = BA (n) , which allows us to employ established fast matrix-by-vector and matrix-by-matrix multiplications when dealing with very large-scale tensors. Efficient and optimized algorithms for TTM are, however, still emerging [11,12,131].…”
Section: Symmetric Tensor Decompositionmentioning
confidence: 99%
“…Computing MTTKRP for dense tensors has also been considered. Nonetheless, these works are often concerned with practical implementation schemes such as parallelization and memory-efficient computation strategies, but the number of computational flops required is naturally high for the dense tensor case; see, e.g., [40,41].…”
Section: Mttkrpmentioning
confidence: 99%
“…Other related work exploits the data layout of matricized tensors and avoid reordering tensor entries using similar ideas to ours for a di erent tensor computation, known as tensor-times-matrix (TTM). Li et al [14] develop a parallelization framework for computing TTMs with dense tensors on multicore platforms. Austin et al…”
Section: Related Workmentioning
confidence: 99%
“…Our main idea of 1-Step MTTKRP is to perform the matrix multiplication without reordering tensor entries, using multiple BLAS calls. Our algorithm is based on the observation that given the natural linearization of tensor entries, the nth mode matricization can be seen as a contiguous set of submatrices, each of which is stored row-major in memory [5,14]. Figure 2 shows how X (n) is ordered in memory, and it also shows how the KRP matrix K can be conformally partitioned to perform the matrix multiplication as a block inner product.…”
Section: -Step Mttkrpmentioning
confidence: 99%