Finding the right cloud configuration for analytics clusters

Bilal, Muhammad; Canini, Marco; Rodrigues, Rodrigo

doi:10.1145/3419111.3421305

Cited by 27 publications

(16 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use default values to set application parameters except for executor 4 and memory parameters in Spark. In order to prevent out of memory (OOM) exceptions, we use Mesos [13] to watch the real usage of memory per executor.…”

Section: Evaluation 51 Experiments Setupmentioning

confidence: 99%

“…However, we have illustrated that the way of reusing pre-trained model is fragile when workloads are from different frameworks. Vanir [4] combines a series of techniques such as Mondrian forest model and transfer learning to search the right cloud configurations. But their have not yet studied the correlation similarities or other types of similarities across frameworks.…”

Section: Related Workmentioning

confidence: 99%

“…To address this challenge, existing performance modeling efforts [21,25,29] and machine learning approaches [4,18,28] have to tolerate huge offline training overhead to build an accurate online model for each framework, since they just consider low-level metrics (such as resource utilizations) within a framework. Sadly, they have to spend a lot of time to train new models for similar applications for new frameworks, although recent works [3,5,10] have proved that these similar applications, both in Hadoop and Spark, involve a wide range of use cases (micro benchmark, machine learning, stream processing and etc.).…”

Section: Introductionmentioning

confidence: 99%

“…Here, we can see their heat maps of budget look completely different. This implies that existing approaches [4,21,25,28,29] may have to build one model per framework. Fortunately, we observed that low-level metrics have high-level similarities across frameworks (blue area), the best (or near best) VM types all appear in the area which follows similar CPU-to-memory ratio (e.g., 8G8U, 16G16U).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning

et al. 2021

50th International Conference on Parallel Processing

View full text Add to dashboard Cite

Cloud providers are presented with a bewildering choice of VM types for a range of contemporary data processing frameworks today. However, existing performance modeling and machine learning efforts cannot pick optimal VM types for multiple frameworks simultaneously, since they are difficult to balance model accuracy and model training cost.We propose Vesta, a novel transfer learning approach, to address this challenge: (1) it abstracts knowledge of VM type selection through offline benchmarking on multiple frameworks;(2) it employs a two-layer bipartite graph to represent knowledge across frameworks; (3) it minimizes training overhead by reusing the knowledge to select the best VM type for given applications. Comparing with state-of-the-art efforts, our experiments on 30 applications of Hadoop, Hive and Spark show that Vesta can improve application performance up to 51% while reducing 85% training overhead. CCS CONCEPTS• General and reference → Performance; • Computing methodologies → Transfer learning; • Software and its engineering → Cloud computing; • Social and professional topics → Pricing and resource allocation.

show abstract

Section: Evaluation 51 Experiments Setupmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning

et al. 2021

50th International Conference on Parallel Processing

View full text Add to dashboard Cite

show abstract

“…Some configuration tuning systems have leveraged similarity to accelerate the tuning process: AROMA [11] clusters the Hadoop workloads then builds a performance model that guides the tuning of each workload cluster. Scout [43] and Vanir [44] exploit workload similarity to explore the search space more effectively for tuning cloud configurations (number of instances and their resource allocation). Ultimately, a joint solution for optimizing cloud instance and DISC framework configurations will be needed.…”

Section: Metricsmentioning

confidence: 99%

Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization

Fekry

Carata

Pasquier

et al. 2020

2020 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

One of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85% potential execution time saving.We propose SimTune, an approach that accommodates such changes through efficient configuration tuning. SimTune combines workload characterization and Multitask Bayesian optimization to identify similarity across workloads and accelerate finding near-optimal configurations. Our experimental results show that SimTune reduces the search time for finding closeto-optimal configurations by 56-73% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning in the rapidly changing environment of distributed analytics.

show abstract

JointConf: Jointly autotuning configuration parameters for modularized graph databases

Dou

Mei

Zhang

et al. 2022

J Software Evolu Process

View full text Add to dashboard Cite

To support different application scenarios, graph databases (GDBs) usually provide a large number of performance-related parameters for developers. Since manually configuring is both time-consuming and cost-intensive, automatically tuning configurations parameters to achieve a better performance has been an urgent need. Besides, considering various graph management requirements, GDBs begin to utilize the modular architecture to interoperate with a wide range of storage and index backends. Due to the complicated interactions among different modules, sequentially tuning each software with previous solutions may fall into a local optimal and it is necessary to jointly autotune the cross-module configuration parameters. Toward this challenging target, we propose JointConf-a new black-box approach of jointly autotuning configuration parameters for modularized GDBs. To address the formulated highdimensional black-box optimization problem, JointConf utilizes the recently proposed BO_dropout algorithm. Inspired by the dropout algorithm in neural networks, BO_dropout explores efficient dimension dropout to achieve a high-dimensional Bayesian optimization. We evaluate the effectiveness of JointConf on a local distributed JanusGraph cluster with three different graph query benchmark applications and experimental results show its advantages over the four baseline search-based approaches. The necessity of jointly tuning for modularized GDBs is also verified in our experiments.

show abstract

Finding the right cloud configuration for analytics clusters

Cited by 27 publications

References 24 publications

Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning

Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning

Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization

JointConf: Jointly autotuning configuration parameters for modularized graph databases

Contact Info

Product

Resources

About