In this paper, we address the problem of optimal distribution of computational tasks on a network of heterogeneous computers when one or more tasks do not fit into the main memory of the processors and when relative speeds vary with the problem size. We propose a functional performance model of heterogeneous processors that integrates many essential features of a network of heterogeneous computers having a major impact on its performance such as the processor heterogeneity, the heterogeneity of memory structure, and the effects of paging. Under this model, the speed of each processor is represented by a continuous function of the size of the problem whereas traditional models use single numbers to represent the speeds of the processors. We formulate a problem of partitioning of an nelement set over p heterogeneous processors using this model and design an algorithm of the complexity O(p × log 2 n) solving the problem.
Energy of computing is a serious environmental concern and mitigating it is an important technological challenge. Accurate measurement of energy consumption during an application execution is key to application-level energy minimization techniques. There are three popular approaches to providing it: (a) System-level physical measurements using external power meters; (b) Measurements using on-chip power sensors and (c) Energy predictive models. In this work, we present a comprehensive study comparing the accuracy of state-of-the-art on-chip power sensors and energy predictive models against system-level physical measurements using external power meters, which we consider to be the ground truth. We show that the average error of the dynamic energy profiles obtained using on-chip power sensors can be as high as 73% and the maximum reaches 300% for two scientific applications, matrix-matrix multiplication and 2D fast Fourier transform for a wide range of problem sizes. The applications are executed on three modern Intel multicore CPUs, two Nvidia GPUs and an Intel Xeon Phi accelerator. The average error of the energy predictive models employing performance monitoring counters (PMCs) as predictor variables can be as high as 32% and the maximum reaches 100% for a diverse set of seventeen benchmarks executed on two Intel multicore CPUs (one Haswell and the other Skylake). We also demonstrate that using inaccurate energy measurements provided by on-chip sensors for dynamic energy optimization can result in significant energy losses up to 84%. We show that, owing to the nature of the deviations of the energy measurements provided by on-chip sensors from the ground truth, calibration can not improve the accuracy of the on-chip sensors to an extent that can allow them to be used in optimization of applications for dynamic energy. Finally, we present the lessons learned, our recommendations for the use of on-chip sensors and energy predictive models and future directions.
This paper presents and analyzes two different strategies of heterogeneous distribution of computations solving dense linear algebra problems on heterogeneous networks of computers. The first strategy is based on heterogeneous distribution of processes over processors and homogeneous block cyclic distribution of data over the processes. The second is based on homogeneous distribution of processes over processors and heterogeneous block cyclic distribution of data over the processes. Both strategies were implemented in the mpC language a dedicated parallel extension of ANSI C for efficient and portable programming of heterogeneous networks of computers. The first strategy was implemented using calls to ScaLAPACK; the second strategy was implemented with calls to LAPACK and BLAS. Cholesky factorization on a heterogeneous network of workstations is used to demonstrate that the heterogeneous distributions have an advantage over the traditional homogeneous distribution.
Academic Press
The paper presents a performance model of a network of heterogeneous computers that takes account of the heterogeneity of memory structure and other architectural differences. Under this model, the speed of each processor is represented by a function of the size of the problem whereas standard models use single numbers to represent the speeds of the processors. We prove that this model is more realistic than the standard ones when the network includes computers with significantly different memory structure. We formulate a problem of partitioning of an nelement set over p heterogeneous processors using this advanced performance model and give its efficient solution of the complexity O(p 2 ×log 2 n).
In the paper, we analyse challenges associated with parallel programming for common networks of computers (NoCs) that are, unlike dedicated parallel computer systems, inherently heterogeneous and unreliable. This analysis results in description of main features of an ideal parallel program for NoCs. We also outline some recent parallel programming tools, which try and respond to some of the challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.