PLANC: Parallel Low Rank Approximation with Non-negativity Constraints

Eswar, Srinivas; Hayashi, Koby; Ballard, Grey; Kannan, Ramakrishnan; Matheson, Michael A.; Park, Haesun

doi:10.48550/arxiv.1909.01149

Cited by 1 publication

(6 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, we propose the multi-sweep dimension tree (MSDT) algorithm, which requires the TTM between an order-N input tensor with dimension size s and the first-contracted input matrix once every N −1 N sweeps and reduce the leading persweep computational cost of a rank-R CP-ALS to 2 N N −1 s N R. This algorithm can produce exactly the same results as the standard dimension tree, i.e., it has no accuracy loss. Leveraging a parallelization strategy similar to previous work [3], [10] that performs the dimension tree calculations locally, our benchmark results show a speed-up of 1.25X compared to the state-of-art dimension tree running on 1024 processors.…”

Section: Introductionmentioning

confidence: 75%

“…Our parallel algorithms for CP-ALS on dense tensors are based on Algorithm 3, which is introduced in [3], [10]. The input tensor T T T with order N is uniformly distributed across an order N processor grid P, and all the factor matrices are initially distributed such that each processor owns a subset of the rows.…”

Section: E Parallel Cp-alsmentioning

confidence: 99%

“…Distributing the work in the solve reduced the computational and bandwidth costs, while raising the latency cost. Our performance evaluation in Section V-B also includes the PLANC implementation of CP-ALS [10], which makes use of a sequential linear system solve. As we will show, the cost of solving linear systems is often small for both approaches, considering that the MTTKRP calculations are the major bottleneck.…”

Section: E Parallel Cp-alsmentioning

confidence: 99%

“…For an order N tensor with modes of dimension s and CP rank R, N MTTKRPs are necessary in each sweep, each costing 2s N R to the leading order. The state-of-art dimension tree based construction of MTTKRP [16], [29], [3] uses amortization to save the cost, and has been implemented in multiple tensor computation libraries [10], [22]. However, it still requires the computational cost of at least 4s N R for each sweep.…”

Section: Introductionmentioning

confidence: 99%

“…Second, we propose a communication-efficient pairwise perturbation algorithm. The implementation also uses a parallelization strategy similar to [3], [10], and reduces the communication cost by constructing the first-order local PP operators in the PP initialization step as well as constructing the local MTTKRP approximations in the PP approximated step. Our benchmark results show that the PP approximated step achieves a speed-up of 1.94X compared to the state-of-art dimension tree running on 1024 processors.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

Solomonik

2021

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

CP tensor decomposition with alternating least squares (ALS) is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel that is necessary to set up the quadratic optimization subproblems. State-of-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N − 1)/N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N − 1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results show that the persweep time achieves 1.25X speed-up for MSDT and 1.94X speed-up for pairwise perturbation compared to the state-of-art dimension trees running on 1024 processors on the Stampede2 supercomputer.

show abstract

Section: Introductionmentioning

confidence: 75%

Section: E Parallel Cp-alsmentioning

confidence: 99%

Section: E Parallel Cp-alsmentioning

confidence: 99%