2018
DOI: 10.1016/j.procs.2018.08.243
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive performance model for dynamic scaling Apache Spark Streaming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 2 publications
0
12
0
Order By: Relevance
“…With increasing training data sizes, the models can achieve high prediction accuracy and near-optimal recommendations. However, obtaining training data can be a costly process, as it requires runs [97] Adjusts number of allocated map/reduce slots Cost functions MROnline [78] Supports aggressive and conservative tuning Gray-box Hill Climbing Ant [29] Supports heterogeneous cluster nodes Genetic Algorithm JellyFish [36] Performs dimensionality reduction of search space Model-based Hill Climbing KERMIT [45] Optimizes CPU and memory for workloads Global and local search Stream T-Storm [125] Optimizes number of Worker processes in Storm Traffic-aware Scheduling Das et al [32] Adapts batch size in Spark Streaming Fixed-Point Iteration DRS [43] Adapts # of Workers and parallelism hint per operator Queueing Theory Drizzle [112] Adapts batch size and grouping in Spark Streaming Group and pre-scheduling Petrov et al [96] Decides # of AWS worker nodes and Spark Executors Cost performance model under different settings to avoid under-fitting. The situation becomes worse as workloads can change dynamically, and unseen applications appear.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…With increasing training data sizes, the models can achieve high prediction accuracy and near-optimal recommendations. However, obtaining training data can be a costly process, as it requires runs [97] Adjusts number of allocated map/reduce slots Cost functions MROnline [78] Supports aggressive and conservative tuning Gray-box Hill Climbing Ant [29] Supports heterogeneous cluster nodes Genetic Algorithm JellyFish [36] Performs dimensionality reduction of search space Model-based Hill Climbing KERMIT [45] Optimizes CPU and memory for workloads Global and local search Stream T-Storm [125] Optimizes number of Worker processes in Storm Traffic-aware Scheduling Das et al [32] Adapts batch size in Spark Streaming Fixed-Point Iteration DRS [43] Adapts # of Workers and parallelism hint per operator Queueing Theory Drizzle [112] Adapts batch size and grouping in Spark Streaming Group and pre-scheduling Petrov et al [96] Decides # of AWS worker nodes and Spark Executors Cost performance model under different settings to avoid under-fitting. The situation becomes worse as workloads can change dynamically, and unseen applications appear.…”
Section: Discussionmentioning
confidence: 99%
“…Last, to achieve higher throughput, Drizzle further implements various optimization approaches both within and across batches. Petrov et al [96] propose a performance model for stream data processing to adaptively assign optimal resources (i.e., number of worker nodes and Executors) to workloads. The framework collects various statistics and system utilization metrics and then uses the models for deciding when and how to scale the current application to maximize throughput.…”
mentioning
confidence: 99%
“…Das et al [31] applied a robust and effective algorithm that adaptively tunes the batch size for promoting the performance of Spark Streaming. Petrov et al [32] proposed a robust and adaptive performance model for Spark Streaming to achieve the goal of allocating resources dynamically and reducing the total cost.…”
Section: Related Workmentioning
confidence: 99%
“…; multiple streams join since such cases/scenarios are non-trivial to solve and may consider for future work. Furthermore, The delay of reporting time in case of missed and delayed events could be reduced by identifying the optimal amount of resources to satisfy the required processing delay under specific stream rate change [59]. In integration with other adaptive streaming techniques, other adaptive techniques such as adaptive load shedding for windowed stream joins [13] and dynamic batch sizing [17] can be integrated with our proposed dynamic windowing based on stream rate change for industrial applications.…”
Section: Challengesmentioning
confidence: 99%