Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

Ghit, Bogdan; Yigitbasi, Nezih; Epema, Dick

doi:10.1109/sc.companion.2012.151

Cited by 12 publications

(11 citation statements)

References 11 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the experiments on Hadoop and YARN, we run 20 map tasks and 20 reduce tasks on the 20 computing nodes. Due to the settings used for Hadoop [33], the map phase will be completed in one wave; all the reduce tasks can also be finished in one wave, without any overlap with the map phase [38]. In Giraph, Stratosphere, and GraphLab, we set the parallelization degree to 20 tasks, also equal to the total number of computing nodes.…”

Section: A Basic Performance: Job Execution Timementioning

confidence: 99%

How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis

Guo

Biczak

Vărbănescu

et al. 2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

100

View full text Add to dashboard Cite

Abstract-Graph-processing platforms are increasingly used in a variety of domains. Although both industry and academia are developing and tuning graph-processing algorithms and platforms, the performance of graph-processing platforms has never been explored or compared in-depth. Thus, users face the daunting challenge of selecting an appropriate platform for their specific application. To alleviate this challenge, we propose an empirical method for benchmarking graph-processing platforms. We define a comprehensive process, and a selection of representative metrics, datasets, and algorithmic classes. We implement a benchmarking suite of five classes of algorithms and seven diverse graphs. Our suite reports on basic (user-lever) performance, resource utilization, scalability, and various overhead. We use our benchmarking suite to analyze and compare six platforms. We gain valuable insights for each platform and present the first comprehensive comparison of graph-processing platforms.

show abstract

Section: A Basic Performance: Job Execution Timementioning

confidence: 99%

How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis

Guo

Biczak

Vărbănescu

et al. 2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

100

View full text Add to dashboard Cite

show abstract

“…In our previous work [12], we have found that the execution time of disk-intensive jobs increases with the ratio between transient and core nodes, while the performance of compute-intensive jobs is independent of the types of nodes.…”

Section: Node Typesmentioning

confidence: 88%

“…Koala is a resource manager which co-allocates processors, possibly from multiple clusters, to various HPC applications and to isolated MapReduce [12] frameworks. When resources are available, each framework may receive additional resources from Koala, but it is their decision to accept or reject them.…”

Section: Related Workmentioning

confidence: 99%

Balanced resource allocations across multiple dynamic MapReduce clusters

Ghit

Yigitbasi

Iosup

et al. 2014

The 2014 ACM International Conference on Measurement and Modeling of Computer Systems

Self Cite

View full text Add to dashboard Cite

Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce instances, a mechanism for dynamic resource (re-)allocations to those instances is required. In this paper, we present such a mechanism called Fawkes that attempts to balance the allocations to MapReduce instances so that they experience similar service levels. Fawkes proposes a new abstraction for deploying MapReduce instances on physical resources, the MR-cluster, which represents a set of resources that can grow and shrink, and that has a core on which MapReduce is installed with the usual data locality assumptions but that relaxes those assumptions for nodes outside the core. Fawkes dynamically grows and shrinks the active MR-clusters based on a family of weighting policies with weights derived from monitoring their operation.We empirically evaluate Fawkes on a multicluster system and show that it can deliver good performance and balanced resource allocations, even when the workloads of the MRclusters are very uneven and bursty, with workloads composed from both synthetic and real-world benchmarks.

show abstract

“…Omega [13] addresses resource allocations across applications and resolves conflicts by optimistic concurrency control. In contrast to above studies, Ghit et al [14] propose a resource management system to facilitate the deployment of MapReduce clusters in an on-demand fashion and with …”

Section: A Big Data Platformsmentioning

confidence: 99%

Optimizing capacity allocation for big data applications in cloud datacenters

Spicuglia

Chen

Birke

et al. 2015

2015 IFIP/IEEE International Symposium on Integrated Network Management (IM)

View full text Add to dashboard Cite

To operate systems cost-effectively, cloud providers not only multiplex applications on the shared infrastructure but also dynamically allocate available resources, such as power and cores. Data intensive applications based on the MapReduce paradigm rapidly grow in popularity and importance in the Cloud. Such big data applications typically have high fan-out of components and workload dynamics. It is no mean feat to deploy and further optimize application performance within (stringent) resource budgets. In this paper, we develop a novel solution, OptiCA, that eases the deployment of big data applications on cloud and the control of application components so that desired performance metrics can be best achieved for any given resource budgets, in terms of core capacities. The control algorithm of OptiCA distributes the available core budget across co-executed applications and components, based on their "effective" demands obtained through non-intrusive profiling. Our proposed solution is able to achieve robust performance, i.e., with very minor degradation, in cases where resource budget decreases rapidly.

show abstract

Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

Cited by 12 publications

References 11 publications

How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis

How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis

Balanced resource allocations across multiple dynamic MapReduce clusters

Optimizing capacity allocation for big data applications in cloud datacenters

Contact Info

Product

Resources

About