Database systems play a central role in modern data-centered applications. Their performance is thus a key factor in the efficiency of data processing pipelines. Modern database systems expose several parameters that users and database administrators can configure to tailor the database settings to the specific application considered. While this task has traditionally been performed manually, in the last years several methods have been proposed to automatically find the best parameter configuration for a database. Many of these methods, however, use statistical models that require high amounts of data and fail to represent all the factors that impact the performance of a database, or implement complex algorithmic solutions. In this work we study the potential of a simple model-free general-purpose configuration tool to automatically find the best parameter configuration of a database. We use the irace configurator to automatically find the best parameter configuration for the Cassandra NoSQL database using the YCBS benchmark under different scenarios. We establish a reliable experimental setup and obtain speedups of up to 30% over the default configuration in terms of throughput, and we provide an analysis of the configurations obtained.
The evaluation of the performance of an IT system is a fundamental operation in its benchmarking and optimization. However, despite the general consensus on the importance of this task, little guidance is usually provided to practitioners who need to benchmark their IT system. In particular, many works in the area of database optimization do not provide an adequate amount of information on the setup used in their experiments and analyses. In this work we report an experimental procedure that, through a sequence of experiments, analyzes the impact of various choices in the design of a database benchmark, leading to the individuation of an experimental setup that balances the consistency of the results with the time needed to obtain them. We show that the minimal experimental setup we obtain is representative also of heavier scenarios, which make it possible for the results of optimization tasks to scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.