2018 IEEE 11th International Conference on Cloud Computing (CLOUD) 2018
DOI: 10.1109/cloud.2018.00059
|View full text |Cite
|
Sign up to set email alerts
|

Towards Automatic Tuning of Apache Spark Configuration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(27 citation statements)
references
References 11 publications
0
27
0
Order By: Relevance
“…In a related, but different direction, Nguyen et al [17] proposed a strategy to generate training data to fit a performance model. The model is meant to be used for predicting which Spark settings yield the smallest application execution time (i.e., capacity planning).…”
Section: Related Workmentioning
confidence: 99%
“…In a related, but different direction, Nguyen et al [17] proposed a strategy to generate training data to fit a performance model. The model is meant to be used for predicting which Spark settings yield the smallest application execution time (i.e., capacity planning).…”
Section: Related Workmentioning
confidence: 99%
“…They then focus on optimizing the subset of relevant parameters, considerably reducing the number of samples needed to achieve good results. In order to avoid running an application many times to find a good configuration, many works use a machine learning model trained to predict the evaluation metric from past executions [8], [11]- [15]. This model, sometimes called the performance model, aims to replace many of the job execution calls with the predictions provided by the model.…”
Section: A Popular Methodologies For the Problemmentioning
confidence: 99%
“…Note that in equation (2) ∈ can change but ∈ is fixed. In some works [7], [8], is a single variable containing the dataset size. This implies that the only information expected to affect the performance of an application is the dataset size.…”
Section: Problem Formulation and Motivationmentioning
confidence: 99%
“…Recently, research on tuning Spark performance has been on the rise [11,25,31,40]. This is due to the increase in the number of applications that use Spark for data intensive tasks [13,28,29]. The work described in [29] is an attempt to find the parameters to tune Spark performance using ML.…”
Section: Related Workmentioning
confidence: 99%
“…This is due to the increase in the number of applications that use Spark for data intensive tasks [13,28,29]. The work described in [29] is an attempt to find the parameters to tune Spark performance using ML. Identifying the best Spark configuration parameters for a specific application is very challenging.…”
Section: Related Workmentioning
confidence: 99%