Intelligent partitioning models are commonly used for efficient parallelization of irregular applications on distributed systems. These models usually aim to minimize a single communication cost metric, which is either related to communication volume or message count. However, both volume-and message-related metrics should be taken into account during partitioning for a more efficient parallelization. There are only a few works that consider both of them and they usually address each in separate phases of a two-phase approach. In this work, we propose a recursive hypergraph bipartitioning framework that reduces the total volume and total message count in a single phase. In this framework, the standard hypergraph models, nets of which already capture the bandwidth cost, are augmented with message nets. The message nets encode the message count so that minimizing conventional cutsize captures the minimization of bandwidth and latency costs together. Our model provides a more accurate representation of the overall communication cost by incorporating both the bandwidth and the latency components into the partitioning objective. The use of the widely-adopted successful recursive bipartitioning framework provides the flexibility of using any existing hypergraph partitioner. The experiments on instances from different domains show that our model on the average achieves up to 52% reduction in total message count and hence results in 29% reduction in parallel running time compared to the model that considers only the total volume.
Tensor decomposition is widely used in the analysis of multi-dimensional data. The canonical polyadic decomposition (CPD) is one of the most popular decomposition methods and commonly found by the CPD-ALS algorithm. High computational and memory costs of CPD-ALS necessitate the use of a distributed-memory-parallel algorithm for efficiency. The medium-grain CPD-ALS algorithm, which adopts multi-dimensional cartesian tensor partitioning, is one of the most successful distributed CPD-ALS algorithms for sparse tensors. This is because cartesian partitioning imposes nice upper bounds on communication overheads. However, this model does not utilize the sparsity pattern of the tensor to reduce the total communication volume. The objective of this work is to fill this literature gap. We propose a novel hypergraph-partitioning model, CartHP, whose partitioning objective correctly encapsulates the minimization of total communication volume of multi-dimensional cartesian tensor partitioning. Experiments on twelve real-world tensors using up to 1024 processors validate the effectiveness of the proposed CartHP model. Compared to the baseline mediumgrain model, CartHP achieves average reductions of 52, 43 and 24 percent in total communication volume, communication time and overall runtime of CPD-ALS, respectively. Index Terms-Sparse tensor, canonical polyadic decomposition, cartesian partitioning, load balancing, communication volume, hypergraph partitioning Ç 1 INTRODUCTION T ENSORS are multi-dimensional arrays consisting of zero or more dimensions (modes). The applications that make use of tensors often benefit from tensor decomposition to discover the latent features of the modes. The most popular tensor decomposition method achieving this feat is the canonical polyadic decomposition (CPD) [1], [2], [3]. CPD is an extension of singular value decomposition for tensors and approximates a given tensor as a sum of rankone tensors. CPD is successfully utilized in a large variety of applications from different domains, such as chemometrics [4], telecommunications [5], medical imaging [6], [7], image compression and analysis [8], text mining [9], [10], knowledge bases [11] and recommendation systems [12]. Kolda and Bader [3] provide an extensive survey on tensor decomposition methods and their applications. One common method for computing CPD is the CPD-ALS algorithm, which exploits the alternating least squares method [13]. CPD-ALS includes a bottleneck operation called Matricized Tensor Times Khatri-Rao Product (MTTKRP), which requires significantly large amounts of computation and memory. This necessitates an efficient distributedmemory implementation for the CPD-ALS algorithm. Recently, Smith and Karypis [14] have proposed a successful distributed-memory implementation of CPD-ALS algorithm. Their algorithm adopts a medium-grain model, in which a cartesian partition of the input tensor is utilized. Cartesian partitioning has the nice property of confining the communications to the layers of a virtual multi-dimensional processor...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.