Software for Sparse Tensor Decomposition on Emerging Computing Architectures

Phipps, Eric Todd; Kolda, Tamara G.

doi:10.1137/18m1210691

Cited by 38 publications

(31 citation statements)

References 25 publications

(49 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the literature, there are various CP-ALS implementations adopting different parallelism paradigms [13], [17], [28], [29], [30], [31], [32]. On distributed-memory systems, DMS [17] is the most commonly-used implementation.…”

Section: Related Workmentioning

confidence: 99%

True Load Balancing for Matricized Tensor Times Khatri-Rao Product

Abubaker

Acer

Aykanat

2021

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.

show abstract

Section: Related Workmentioning

confidence: 99%

True Load Balancing for Matricized Tensor Times Khatri-Rao Product

Abubaker

Acer

Aykanat

2021

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…Bader and Kolda [3] consider both dense and sparse Y tensors, showing that the cost is O(rn d ) for dense Y and O(r nnz(Y)) for sparse Y. Phan, Tichavsky, and Cichocki [39] propose methods to reuse partial computations when computing the MTTKRP for all d modes in sequence. Much recent work has focused on more efficient representations of sparse tensors and parallel MTTKRP computations [44,24,29,40]. There is also continued work on improving the efficiency of dense MTTKRP calculations [20,5].…”

Section: Tensor Notationmentioning

confidence: 99%

“…In terms of implementations, an interesting consequence of sampling in the context of parallel tensor decomposition [44,24,29,40] is that we can reduce the computation and/or communication by sampling only a subset of the entries. Moreover, we may be able to stratify the samples in such a way that is amenable to more structured communications.…”

Section: Samplingmentioning

confidence: 99%

Stochastic Gradients for Large-Scale Tensor Decomposition

Kolda¹,

Hong²

2020

SIAM Journal on Mathematics of Data Science

Self Cite

View full text Add to dashboard Cite

Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of largescale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor and is efficient because it can be computed using the sparse matricized-tensor times Khatri-Rao product tensor kernel. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.

show abstract

“…Besides, their performance e ciency is low because of MATLAB environment. Recently, many other highly performance e cient libraries emerge, such as SPLATT [130], Cyclops Tensor Framework (CTF) [132], DFacTo [24], GigaTensor [65], HyperTensor [69], GenTen [110], to name a few. However, these libraries are speci c to one or two particular sparse tensor operations, this violates the application diversity requirement.…”

Section: Pasta In Needmentioning

confidence: 99%

PASTA: a parallel sparse tensor algorithm benchmark suite

et al. 2019

CCF Trans. HPC

View full text Add to dashboard Cite

Tensor methods have gained increasingly a ention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and their algorithms become critical to further improve the performance of these methods and enhance the interpretability of their output. is work presents a sparse tensor algorithm benchmark suite (PASTA) for single-and multi-core CPUs. To the best of our knowledge, this is the rst benchmark suite for sparse tensor world. PASTA targets on: 1) helping application users to evaluate di erent computer systems using its representative computational workloads; 2) providing insights to be er utilize existed computer architecture and systems and inspiration for the future design. is benchmark suite will be publicly released. ACM Reference format:

show abstract

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

Cited by 38 publications

References 25 publications

True Load Balancing for Matricized Tensor Times Khatri-Rao Product

True Load Balancing for Matricized Tensor Times Khatri-Rao Product

Stochastic Gradients for Large-Scale Tensor Decomposition

PASTA: a parallel sparse tensor algorithm benchmark suite

Contact Info

Product

Resources

About