Black or White? How to Develop an AutoTuner for Memory-based Analytics

Kunjir, Mayuresh; Babu, Shivnath

doi:10.1145/3318464.3380591

Cited by 52 publications

(29 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, there has been an active research area on automatically tuning database configurations using Machine Learning (ML) techniques [3,18,24,44,47,55,91,92]. We summarize three key modules in the existing tuning systems: knob selection that prunes the configuration space, configuration optimization that samples promising configurations over the pruned space, and knowledge transfer that further speeds up the tuning process via historical data.…”

Section: Introductionmentioning

confidence: 99%

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

Zhang¹,

Chang²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, using automatic configuration tuning to improve the performance of modern database management systems (DBMSs) has attracted increasing interest from the database community. This is embodied with a number of systems featuring advanced tuning capabilities being developed. However, it remains a challenge to select the best solution for database configuration tuning, considering the large body of algorithm choices. In addition, beyond the applications on database systems, we could find more potential algorithms designed for configuration tuning. To this end, this paper provides a comprehensive evaluation of configuration tuning techniques from a broader perspective, hoping to better benefit the database community. In particular, we summarize three key modules of database configuration tuning systems and conduct extensive ablation studies using various challenging cases. Our evaluation demonstrates that the hyper-parameter optimization algorithms can be borrowed to further enhance the database configuration tuning. Moreover, we identify the best algorithm choices for different modules. Beyond the comprehensive evaluations, we offer an efficient and unified database configuration tuning benchmark via surrogates that reduces the evaluation cost to a minimum, allowing for extensive runs and analysis of new techniques.

show abstract

Section: Introductionmentioning

confidence: 99%

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

Zhang¹,

Chang²,

Yang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Kunjir et al [38] investigate memory allocation auto-tuning for applications on distributed data processing systems and propose a white-box algorithm RelM. The RelM is developed to empirically model the interactions of memory management options and provides analytical models to estimate different competing memory pools requirements in an application.…”

Section: Tuningmentioning

confidence: 99%

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Cai¹,

Cui²,

Xiong³

et al. 2021

Preprint

View full text Add to dashboard Cite

Data processing and analytics are fundamental and pervasive. Algorithms play a vital role in data processing and analytics where many algorithm designs have incorporated heuristics and general rules from human knowledge and experience to improve their effectiveness. Recently, reinforcement learning, deep reinforcement learning (DRL) in particular, is increasingly explored and exploited in many areas because it can learn better strategies in complicated environments it is interacting with than statically designed algorithms. Motivated by this trend, we provide a comprehensive review of recent works focusing on utilizing deep reinforcement learning to improve data processing and analytics. First, we present an introduction to key concepts, theories, and methods in deep reinforcement learning. Next, we discuss deep reinforcement learning deployment on database systems, facilitating data processing and analytics in various aspects, including data organization, scheduling, tuning, and indexing. Then, we survey the application of deep reinforcement learning in data processing and analytics , ranging from data preparation, natural language interface to healthcare, fintech, etc. Finally, we discuss important open challenges and future research directions of using deep reinforcement learning in data processing and analytics.

show abstract

“…Auto-tuning with a performance model which leverages an application characterization to aid the search: [15] and [30] are two publications in the area of Spark auto-tuning that proposes more similar systems to our work, generalizing unseen workloads by some characterization of the application. [15] approach to extract the features is more close to our work as it capture Task and Stage information from an Spark application, while [30] follows a more general approach by profiling the application to extract statistics like the average CPU and disk usage that can work with other Big Data frameworks rather than Spark.…”

Section: E Experiments For Rq2mentioning

confidence: 99%

“…In [30], a vector of statistics is extracted from the runtime of the application to be optimized. This feature vector is used to modify a BO procedure to guide the search process.…”

Section: E Experiments For Rq2mentioning

confidence: 99%

See 1 more Smart Citation

You Only Run Once: Spark Auto-Tuning From a Single Run

Buchaca

Portella

Costa

et al. 2020

IEEE Trans. Netw. Serv. Manage.

View full text Add to dashboard Cite

Tuning configurations of Spark jobs is not a trivial task. State-of-the-art auto-tuning systems are based on iteratively running workloads with different configurations. During the optimization process, the relevant features are explored to find good solutions. Many optimizers enhance the time-to-solution using black-box optimization algorithms that do not take into account any information from the Spark workloads. In this paper, we present a new method for tuning configurations that uses information from one run of a Spark workload. To achieve good performance, we mine the SparkEventLog that is generated by the Spark engine. This log file contains a large amount of information from the executed application. We use this information to enhance a performance model with low-level features from the workload to be optimized. These features include Spark Actions, Transformations, and Task metrics. This process allows us to obtain application-specific workload information. With this information our system can predict sensible Spark configurations for unseen jobs, given that it has been trained with reasonable coverage of Spark applications. Experiments show that the presented system correctly produces good configurations, while achieving up to 80% speedup with respect to the default Spark configuration, and up to 12x speedup of the time-to-solution with respect to a standard Bayesian Optimization procedure.

show abstract

Black or White? How to Develop an AutoTuner for Memory-based Analytics

Cited by 52 publications

References 33 publications

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

You Only Run Once: Spark Auto-Tuning From a Single Run

Contact Info

Product

Resources

About