Search engines are nowadays widely applied to store and analyze logs generated by largescale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach 2.07×. In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO. INDEX TERMS Log search engine, configuration parameter tuning, black-box optimization, Bayesian optimization, random embedding.
To support different application scenarios, graph databases (GDBs) usually provide a large number of performance-related parameters for developers. Since manually configuring is both time-consuming and cost-intensive, automatically tuning configurations parameters to achieve a better performance has been an urgent need. Besides, considering various graph management requirements, GDBs begin to utilize the modular architecture to interoperate with a wide range of storage and index backends. Due to the complicated interactions among different modules, sequentially tuning each software with previous solutions may fall into a local optimal and it is necessary to jointly autotune the cross-module configuration parameters. Toward this challenging target, we propose JointConf-a new black-box approach of jointly autotuning configuration parameters for modularized GDBs. To address the formulated highdimensional black-box optimization problem, JointConf utilizes the recently proposed BO_dropout algorithm. Inspired by the dropout algorithm in neural networks, BO_dropout explores efficient dimension dropout to achieve a high-dimensional Bayesian optimization. We evaluate the effectiveness of JointConf on a local distributed JanusGraph cluster with three different graph query benchmark applications and experimental results show its advantages over the four baseline search-based approaches. The necessity of jointly tuning for modularized GDBs is also verified in our experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.