Parallel Nonnegative CP Decomposition of Dense Tensors

Ballard, Grey; Hayashi, Koby; Kannan, Ramakrishnan

doi:10.1109/hipc.2018.00012

Cited by 17 publications

(24 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, we propose the multi-sweep dimension tree (MSDT) algorithm, which requires the TTM between an order-N input tensor with dimension size s and the first-contracted input matrix once every N −1 N sweeps and reduce the leading persweep computational cost of a rank-R CP-ALS to 2 N N −1 s N R. This algorithm can produce exactly the same results as the standard dimension tree, i.e., it has no accuracy loss. Leveraging a parallelization strategy similar to previous work [3], [10] that performs the dimension tree calculations locally, our benchmark results show a speed-up of 1.25X compared to the state-of-art dimension tree running on 1024 processors.…”

Section: Introductionmentioning

confidence: 75%

“…For an order N tensor with modes of dimension s and CP rank R, N MTTKRPs are necessary in each sweep, each costing 2s N R to the leading order. The state-of-art dimension tree based construction of MTTKRP [16], [29], [3] uses amortization to save the cost, and has been implemented in multiple tensor computation libraries [10], [22]. However, it still requires the computational cost of at least 4s N R for each sweep.…”

Section: Introductionmentioning

confidence: 99%

“…To further accelerate CP-ALS, many researchers leverage different techniques to parallelize and approximate MTTKRP calculations. The parallelization strategies have been developed both for dense tensors on GPUs [12] and distributed memory systems [3], [20], and for sparse tensors on GPUs [27] and distributed memory systems [19], [35], [34], [17]. The communication lower bounds for a single dense MTTKRP computation has been discussed in [5], [4].…”

Section: Introductionmentioning

confidence: 99%

“…Second, we propose a communication-efficient pairwise perturbation algorithm. The implementation also uses a parallelization strategy similar to [3], [10], and reduces the communication cost by constructing the first-order local PP operators in the PP initialization step as well as constructing the local MTTKRP approximations in the PP approximated step. Our benchmark results show that the PP approximated step achieves a speed-up of 1.94X compared to the state-of-art dimension tree running on 1024 processors.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

Solomonik

2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

CP tensor decomposition with alternating least squares (ALS) is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel that is necessary to set up the quadratic optimization subproblems. State-of-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N − 1)/N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N − 1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results show that the persweep time achieves 1.25X speed-up for MSDT and 1.94X speed-up for pairwise perturbation compared to the state-of-art dimension trees running on 1024 processors on the Stampede2 supercomputer.

show abstract

Section: Introductionmentioning

confidence: 75%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

Solomonik

2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

show abstract

“…With modern machine learning applications of NTF in mind, for which input tensor sizes can be extremely large and NTF should be provided as a low-level routine, there would be a definite economical and scientific gain to speeding up NTF algorithms. Radically different approaches exist in the literature to speed up existing algorithms for solving NTF, such as parallel computing, 67,68 compression, and sketching. 69,70 The combinations and relationships between these methods is poorly understood.…”

Section: Making Bcd Significantly Faster With Hermentioning

confidence: 99%

Accelerating block coordinate descent for nonnegative tensor factorization

Ang

Cohen

Gillis

et al. 2021

Numerical Linear Algebra App

View full text Add to dashboard Cite

This paper is concerned with improving the empirical convergence speed of block-coordinate descent algorithms for approximate nonnegative tensor factorization (NTF). We propose an extrapolation strategy in-between block updates, referred to as heuristic extrapolation with restarts (HER). HER significantly accelerates the empirical convergence speed of most existing block-coordinate algorithms for NTF, in particular for challenging computational scenarios, while requiring a negligible additional computational budget.

show abstract

Accelerating alternating least squares for tensor decomposition by pairwise perturbation

Solomonik

2022

Numerical Linear Algebra App

View full text Add to dashboard Cite

The alternating least squares (ALS) algorithm for CP and Tucker decomposition is dominated in cost by the tensor contractions necessary to set up the quadratic optimization subproblems. We introduce a novel family of algorithms that uses perturbative corrections to the subproblems rather than recomputing the tensor contractions. This approximation is accurate when the factor matrices are changing little across iterations, which occurs when ALS approaches convergence. We provide a theoretical analysis to bound the approximation error.Our numerical experiments demonstrate that the proposed pairwise perturbation algorithms are easy to control and converge to minima that are as good as ALS. The experimental results show improvements of up to 3.1× with respect to state-of-the-art ALS approaches for various model tensor problems and real datasets.

show abstract

Parallel Nonnegative CP Decomposition of Dense Tensors

Cited by 17 publications

References 32 publications

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

Accelerating block coordinate descent for nonnegative tensor factorization

Accelerating alternating least squares for tensor decomposition by pairwise perturbation

Contact Info

Product

Resources

About