Large scale graph analytics are an important class of problem in the modern data center. However, while data centers are trending towards a large number of heterogeneous processing nodes, graph analytics frameworks still operate under the assumption of uniform compute resources. In this paper, we develop heterogeneity-aware data ingress strategies for graph analytics workloads using the popular Power-Graph framework. We illustrate how simple estimates of relative node computational throughput can guide heterogeneityaware data partitioning algorithms to provide balanced graph cutting decisions. Our work enhances five online data ingress strategies from a variety of sources to optimize application execution for throughput differences in heterogeneous data centers. The proposed partitioning algorithms improve the runtime of several popular machine learning and data mining applications by as much as a 65% and on average by 32% as compared to the default, balanced partitioning approaches. CCS Concepts •Computer systems organization → Cloud computing; Heterogeneous (hybrid) systems; •Information systems → Data layout; •Computing methodologies → Distributed programming languages; •Mathematics of computing → Graph algorithms;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.