An input-adaptive and in-place approach to dense tensor-times-matrix multiply

Li, Jiajia; Battaglino, Casey; Perros, Ioakeim; Sun, Jimeng; Vuduc, Richard

doi:10.1145/2807591.2807671

Cited by 57 publications

(48 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From (2.12) and Figure 2.10, the equivalent matrix form is C (n) = BA (n) , which allows us to employ established fast matrix-by-vector and matrix-by-matrix multiplications when dealing with very large-scale tensors. Efficient and optimized algorithms for TTM are, however, still emerging [11,12,131].…”

Section: Symmetric Tensor Decompositionmentioning

confidence: 99%

Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions

Cichocki

Lee

Oseledets

et al. 2016

FNT in Machine Learning

372

342

View full text Add to dashboard Cite

Modern applications in engineering and data science are increasingly based on multidimensional data of exceedingly high volume, variety, and structural richness. However, standard machine learning algorithms typically scale exponentially with data volume and complexity of cross-modal couplings - the so called curse of dimensionality - which is prohibitive to the analysis of large-scale, multi-modal and multi-relational datasets. Given that such data are often efficiently represented as multiway arrays or tensors, it is therefore timely and valuable for the multidisciplinary machine learning and data analytic communities to review low-rank tensor decompositions and tensor networks as emerging tools for dimensionality reduction and large scale optimization problems. Our particular emphasis is on elucidating that, by virtue of the underlying low-rank approximations, tensor networks have the ability to alleviate the curse of dimensionality in a number of applied areas. In Part 1 of this monograph we provide innovative solutions to low-rank tensor network decompositions and easy to interpret graphical representations of the mathematical operations on tensor networks. Such a conceptual insight allows for seamless migration of ideas from the flat-view matrices to tensor network operations and vice versa, and provides a platform for further developments, practical applications, and non-Euclidean extensions. It also permits the introduction of various tensor network operations without an explicit notion of mathematical expressions, which may be beneficial for many research communities that do not directly rely on multilinear algebra. Our focus is on the Tucker and tensor train (TT) decompositions and their extensions, and on demonstrating the ability of tensor networks to provide linearly or even super-linearly (e.g., logarithmically) scalable solutions, as illustrated in detail in Part 2 of this monograph

show abstract

Section: Symmetric Tensor Decompositionmentioning

confidence: 99%

Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions

Cichocki

Lee

Oseledets

et al. 2016

FNT in Machine Learning

372

342

View full text Add to dashboard Cite

show abstract

“…Computing MTTKRP for dense tensors has also been considered. Nonetheless, these works are often concerned with practical implementation schemes such as parallelization and memory-efficient computation strategies, but the number of computational flops required is naturally high for the dense tensor case; see, e.g., [40,41].…”

Section: Mttkrpmentioning

confidence: 99%

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

Ibrahim

Wai

et al. 2020

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

This work considers the problem of computing the canonical polyadic decomposition (CPD) of large tensors. Prior works mostly leverage data sparsity to handle this problem, which is not suitable for handling dense tensors that often arise in applications such as medical imaging, computer vision, and remote sensing. Stochastic optimization is known for its low memory cost and per-iteration complexity when handling dense data. However, exisiting stochastic CPD algorithms are not flexible enough to incorporate a variety of constraints/regularizations that are of interest in signal and data analytics. Convergence properties of many such algorithms are also unclear. In this work, we propose a stochastic optimization framework for large-scale CPD with constraints/regularizations. The framework works under a doubly randomized fashion, and can be regarded as a judicious combination of randomized block coordinate descent (BCD) and stochastic proximal gradient (SPG). The algorithm enjoys lightweight updates and small memory footprint. In addition, this framework entails considerable flexibility-many frequently used regularizers and constraints can be readily handled under the proposed scheme. The approach is also supported by convergence analysis. Numerical results on large-scale dense tensors are employed to showcase the effectiveness of the proposed approach.

show abstract

“…Other related work exploits the data layout of matricized tensors and avoid reordering tensor entries using similar ideas to ours for a di erent tensor computation, known as tensor-times-matrix (TTM). Li et al [14] develop a parallelization framework for computing TTMs with dense tensors on multicore platforms. Austin et al…”

Section: Related Workmentioning

confidence: 99%

“…Our main idea of 1-Step MTTKRP is to perform the matrix multiplication without reordering tensor entries, using multiple BLAS calls. Our algorithm is based on the observation that given the natural linearization of tensor entries, the nth mode matricization can be seen as a contiguous set of submatrices, each of which is stored row-major in memory [5,14]. Figure 2 shows how X (n) is ordered in memory, and it also shows how the KRP matrix K can be conformally partitioned to perform the matrix multiplication as a block inner product.…”

Section: -Step Mttkrpmentioning

confidence: 99%

Shared-memory parallelization of MTTKRP for dense tensors

et al. 2018

View full text Add to dashboard Cite

e matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bo leneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. e algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We benchmark sequential and parallel performance of our implementations, demonstrating high sequential performance and e cient parallel scaling. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to 7.4× over existing parallel so ware.

show abstract

An input-adaptive and in-place approach to dense tensor-times-matrix multiply

Cited by 57 publications

References 35 publications

Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions

Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

Shared-memory parallelization of MTTKRP for dense tensors

Contact Info

Product

Resources

About