2015 IEEE International Conference on Cluster Computing 2015
DOI: 10.1109/cluster.2015.13
|View full text |Cite
|
Sign up to set email alerts
|

Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization

Abstract: Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework experti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…A model-based approach is concerned with conducting experiments on a chosen set of configurations to observe their performance [11,30,26,15,24]. The restriction of using a limited number of configurations is a limitation that our approach overcomes because there are numerous configurations in practice with many dependencies between them and these can be used to get better optimization.…”
Section: Related Workmentioning
confidence: 99%
“…A model-based approach is concerned with conducting experiments on a chosen set of configurations to observe their performance [11,30,26,15,24]. The restriction of using a limited number of configurations is a limitation that our approach overcomes because there are numerous configurations in practice with many dependencies between them and these can be used to get better optimization.…”
Section: Related Workmentioning
confidence: 99%
“…Such settings are expected to improve load balancing in the cluster. Other suggestions include maintaining a ratio of one task per Executor to prevent the context switching overhead among tasks, as well as having one Acker thread per Worker process [41]. Besides, the total number of CPU-bound tasks should not exceed the total number of Workers to avoid CPU contention, while I/O-bound tasks could exceed that limit [1].…”
Section: Stream Processing Systemsmentioning
confidence: 99%
“…Fischer et al [41] proposed a method to automatically choose configuration for distributed stream processing. In particular, they leverage a Bayesian network to predict the parameters with good performance.…”
Section: Stream Processing Systemsmentioning
confidence: 99%
“…The results are used to train a statistical model for finding good configurations. Fischer et al [14] and Trotter et al [17] present an auto-tuning algorithm using Bayesian Optimization (BO) [19] to achieve high throughput. Jamshidi et al [15] likewise proposes BO, however, it optimizes latency and leverages Gaussian Processes [20] to continuously estimate the mean and confidence interval of a response variable at yet-to-be explored configurations.…”
Section: Related Workmentioning
confidence: 99%