SummaryNested loops are the largest source of parallelism in many data-parallel scientific applications.Heterogeneous distributed systems are popular computing platforms for data-parallel applications. Data partitioning is critical in exploiting the computational power of such systems, and existing data partitioning algorithms try to maximize performance of data-parallel applications by finding a data distribution that balances the workload between the processing nodes while minimizing communication costs. This paper addresses the problem of 3-dimensional data partitioning for 3-level perfectly nested loops on heterogeneous distributed systems. The primary aim is to minimize the execution time by improving the load balancing and minimizing the internode communications. We propose a new data partitioning algorithm using dynamic programming, build a theoretical model to estimate the execution time of each partition, and select a partition with minimum execution time as a near-optimal solution. We demonstrate the effectiveness of the new algorithm for 2 data-parallel scientific applications on heterogeneous distributed systems. The new algorithm reduces the execution time by between 7% and 17%, on average, compared with leading data partitioning methods on 3 heterogeneous distributed systems. architecture and the need to meet ever-increasing computing needs of scientific applications, the computational capacity of the clusters is often increased. 8,9 As a first approach, we could replace all processing nodes with newer, faster ones. In this case, the cluster remains homogeneous over time, but a complete replacement of all nodes can be very costly. As a second approach, we could upgrade the cluster by adding more processing nodes that use a newer technology with higher speed. Also, we could aggregate several clusters together to use their computational power for solving computing problems. 8 Another approach is adding graphics processing units (GPUs) to improve performance of existing nodes. 9 In the latter 3 cases, the cluster becomes heterogeneous. [10][11][12] In this way, heterogeneous computing systems have emerged as an important contribution to provide computational capacity in high-performance computing. In fact, the prevalence of heterogeneous systems in the TOP500 list grew from 3.4% to 18.0%
Nested loops are one of the most time-consuming parts and the largest sources of parallelism in many scientific applications. In this paper, we address the problem of 3-dimensional tiling and scheduling of three-level perfectly nested loops with dependencies on heterogeneous systems. To exploit the parallelism, we tile and schedule nested loops with dependencies by awareness of computational power of the processing nodes and execute them in pipeline mode. The tile size plays an important role to improve the parallel execution time of nested loops. We develop and evaluate a theoretical model to estimate the parallel execution time of tilled nested loops. Also, we propose a tiling genetic algorithm that used the proposed model to find the nearoptimal tile size, minimizing the parallel execution time of dependence nested loops. We demonstrate the accuracy of theoretical model and effectiveness of the proposed tiling genetic algorithm by several experiments on heterogeneous systems. The 3D tiling reduces the parallel execution time by a factor of 1.2× to 2× over the 2D tiling, while parallelizing 3D heat equation as a benchmark.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.