Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Syste 2018
DOI: 10.1145/3173162.3173187
|View full text |Cite
|
Sign up to set email alerts
|

Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 67 publications
(63 citation statements)
references
References 35 publications
0
63
0
Order By: Relevance
“…A straightforward method [12]- [14], [24], [32]- [34] to solve the configuration parameter optimization problem is to construct an offline prediction model first and then apply some search algorithms to online find the optimal configuration based on this prediction model. For instance, Xiong et al [24] utilize an ensemble learning algorithm to build the performance-prediction model and leverage genetic algorithm to search the optimal configuration parameters for HBase.…”
Section: A Prediction Model-based Methodsmentioning
confidence: 99%
“…A straightforward method [12]- [14], [24], [32]- [34] to solve the configuration parameter optimization problem is to construct an offline prediction model first and then apply some search algorithms to online find the optimal configuration based on this prediction model. For instance, Xiong et al [24] utilize an ensemble learning algorithm to build the performance-prediction model and leverage genetic algorithm to search the optimal configuration parameters for HBase.…”
Section: A Prediction Model-based Methodsmentioning
confidence: 99%
“…If that were the case, amortization could be done over the lifetime of a cluster rather than for individual workloads. The main difficulties posed for training such a model are: 1) hundreds of executions are needed to build it [26]; 2) difficulties in adapting to dynamic resource allocation in the cluster; 3) the high diversity of the workloads makes it harder to build a single cost model of a good accuracy [7]; 4) the high dimensionality of the search space: one dimension per configuration parameter. Complex data processing frameworks such as Spark commonly have 20 -60 parameters that are relevant for tuning, and our experimental evaluation with system-wide models yields results around 40% worse than optimal.…”
Section: Tuning Cost Amortizationmentioning
confidence: 99%
“…git 2 $ cd tuneful -code 3 $ mvn clean package 4 $ / usr / lib / spark / bin / spark -submit Table 3. We selected those parameters as they cover a wide range of Spark's internal aspects (memory, processing, shuffle and network aspects) and represent a superset of the ones used in the related work [26,27], with approximatively 2 • 10 40 configurations possible in total (this represents the size of the search space).…”
Section: A Appendix A1 Experiments Reproducibilitymentioning
confidence: 99%
See 2 more Smart Citations