2018 IEEE 25th International Conference on High Performance Computing (HiPC) 2018
DOI: 10.1109/hipc.2018.00012
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Nonnegative CP Decomposition of Dense Tensors

Abstract: The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the matricized-tensor times Khatri-Rao product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 32 publications
0
24
0
Order By: Relevance
“…First, we propose the multi-sweep dimension tree (MSDT) algorithm, which requires the TTM between an order-N input tensor with dimension size s and the first-contracted input matrix once every N −1 N sweeps and reduce the leading persweep computational cost of a rank-R CP-ALS to 2 N N −1 s N R. This algorithm can produce exactly the same results as the standard dimension tree, i.e., it has no accuracy loss. Leveraging a parallelization strategy similar to previous work [3], [10] that performs the dimension tree calculations locally, our benchmark results show a speed-up of 1.25X compared to the state-of-art dimension tree running on 1024 processors.…”
Section: Introductionmentioning
confidence: 75%
See 3 more Smart Citations
“…First, we propose the multi-sweep dimension tree (MSDT) algorithm, which requires the TTM between an order-N input tensor with dimension size s and the first-contracted input matrix once every N −1 N sweeps and reduce the leading persweep computational cost of a rank-R CP-ALS to 2 N N −1 s N R. This algorithm can produce exactly the same results as the standard dimension tree, i.e., it has no accuracy loss. Leveraging a parallelization strategy similar to previous work [3], [10] that performs the dimension tree calculations locally, our benchmark results show a speed-up of 1.25X compared to the state-of-art dimension tree running on 1024 processors.…”
Section: Introductionmentioning
confidence: 75%
“…For an order N tensor with modes of dimension s and CP rank R, N MTTKRPs are necessary in each sweep, each costing 2s N R to the leading order. The state-of-art dimension tree based construction of MTTKRP [16], [29], [3] uses amortization to save the cost, and has been implemented in multiple tensor computation libraries [10], [22]. However, it still requires the computational cost of at least 4s N R for each sweep.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…With modern machine learning applications of NTF in mind, for which input tensor sizes can be extremely large and NTF should be provided as a low-level routine, there would be a definite economical and scientific gain to speeding up NTF algorithms. Radically different approaches exist in the literature to speed up existing algorithms for solving NTF, such as parallel computing, 67,68 compression, and sketching. 69,70 The combinations and relationships between these methods is poorly understood.…”
Section: Making Bcd Significantly Faster With Hermentioning
confidence: 99%