ATH: Auto-Tuning HBase’s Configuration via Ensemble Learning

Xiong, Wen; Bei, Zhengdong; Xu, Cheng‐Zhong; Yu, Zhibin

doi:10.1109/access.2017.2716441

Cited by 11 publications

(5 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A straightforward method [12]- [14], [24], [32]- [34] to solve the configuration parameter optimization problem is to construct an offline prediction model first and then apply some search algorithms to online find the optimal configuration based on this prediction model. For instance, Xiong et al [24] utilize an ensemble learning algorithm to build the performance-prediction model and leverage genetic algorithm to search the optimal configuration parameters for HBase. Similarly, for Spark clusters, Yu et al [12] propose a hierarchical modeling method to build the prediction model and then employ genetic algorithm to find the optimal configuration.…”

Section: A Prediction Model-based Methodsmentioning

confidence: 99%

“…• Black-box objective function. To solve this problem, a straightforward method is to construct a performance prediction model first and then utilize some search based algorithms to explore the optimal configuration [11]- [14], [24]. However, due to the complex implementation of log search engines, it is very difficult if not impossible to figure out the relationship between configuration parameters and performance, and building a useful prediction model usually requires a considerable number of high-quality observations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines

2020

View full text Add to dashboard Cite

Search engines are nowadays widely applied to store and analyze logs generated by largescale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach 2.07×. In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO. INDEX TERMS Log search engine, configuration parameter tuning, black-box optimization, Bayesian optimization, random embedding.

show abstract

Section: A Prediction Model-based Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines

2020

View full text Add to dashboard Cite

show abstract

“…A straightforward solution to address the configuration parameter autotuning problem is to train an offline prediction model first and then apply some search algorithms to online find the optimal configuration based on this prediction model. For example, Random Forest is utilized to build a performance model for HBase 4 and Hadoop 12 . Similarly, for Spark clusters, Yu et al 13 propose a hierarchical modeling method while Bei et al 20 use the ensemble learning to build the prediction model and then employ genetic algorithm to find the optimal configuration.…”

Section: Related Workmentioning

confidence: 99%

“…For instance, JanusGraph itself mainly focuses on graph serialization and query execution, while providing adapters to integrate third‐party softwares as its functional module for data storage and indices. Unfortunately, although there are already a few significant works toward automatically tuning parameters for different databases such as HBase, 4 Elasticsearch, 5 RocksDB, 6 and MySQL, 7 these solutions cannot be directly applied in the scenarios of modularized GDBs because they solely consider one specific software. What is worse, due to the complicated interactions across different modules, sequentially tuning each software with previous solutions may also fail to efficiently find the optimal configuration for modularized GDBs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

JointConf: Jointly autotuning configuration parameters for modularized graph databases

Dou

Mei

Zhang

et al. 2022

J Software Evolu Process

View full text Add to dashboard Cite

To support different application scenarios, graph databases (GDBs) usually provide a large number of performance-related parameters for developers. Since manually configuring is both time-consuming and cost-intensive, automatically tuning configurations parameters to achieve a better performance has been an urgent need. Besides, considering various graph management requirements, GDBs begin to utilize the modular architecture to interoperate with a wide range of storage and index backends. Due to the complicated interactions among different modules, sequentially tuning each software with previous solutions may fall into a local optimal and it is necessary to jointly autotune the cross-module configuration parameters. Toward this challenging target, we propose JointConf-a new black-box approach of jointly autotuning configuration parameters for modularized GDBs. To address the formulated highdimensional black-box optimization problem, JointConf utilizes the recently proposed BO_dropout algorithm. Inspired by the dropout algorithm in neural networks, BO_dropout explores efficient dimension dropout to achieve a high-dimensional Bayesian optimization. We evaluate the effectiveness of JointConf on a local distributed JanusGraph cluster with three different graph query benchmark applications and experimental results show its advantages over the four baseline search-based approaches. The necessity of jointly tuning for modularized GDBs is also verified in our experiments.

show abstract

ConfAdvisor: An Automatic Configuration Tuning Framework for NoSQL Database Benchmarking with a Black-box Approach

Chen

Huo

et al. 2021

Benchmarking, Measuring, and Optimizing

View full text Add to dashboard Cite

ATH: Auto-Tuning HBase’s Configuration via Ensemble Learning

Cited by 11 publications

References 23 publications

Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines

Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines

JointConf: Jointly autotuning configuration parameters for modularized graph databases

ConfAdvisor: An Automatic Configuration Tuning Framework for NoSQL Database Benchmarking with a Black-box Approach

Contact Info

Product

Resources

About