Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond 2018
DOI: 10.1145/3206333.3206339
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Caching Decision for Scientific Dataflow Execution in Apache Spark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 11 publications
0
5
0
2
Order By: Relevance
“…S-CACHE [28] automatically makes a sub-optimal caching decision by analyzing the application's execution flow and cost model, implemented in Apache Spark. It calculates the computational cost of individual caching decisions by considering the dataset's computation cost, cache writes cost, and cache read cost.…”
Section: Related Workmentioning
confidence: 99%
“…S-CACHE [28] automatically makes a sub-optimal caching decision by analyzing the application's execution flow and cost model, implemented in Apache Spark. It calculates the computational cost of individual caching decisions by considering the dataset's computation cost, cache writes cost, and cache read cost.…”
Section: Related Workmentioning
confidence: 99%
“…We obtained 19 papers 20‐22,32,75‐89 . These papers were published in five distinct journals publications, 13 conferences/workshops and one PhD thesis.…”
Section: Related Workmentioning
confidence: 99%
“…Considering the work by Gotting et al, 84 an automatic pre‐computing strategy computes an optimal combination of cache operations given a dataflow definition and a simple operation cost model for a Spark dataflow, under memory constraints. The work is orthogonal to the one presented in this article as the latter obtain performance improvements that are independent of code changes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The problem of this approach is that it is static, i.e., they do not consider automatic caching. Gottin et al [15] propose an algorithm that finds an optimized cache decision plan for a dataflow execution in Apache Spark. The approach is based on a cost model that uses provenance data, and tries the possible combinations of caching selection in order to select the best one.…”
Section: Related Workmentioning
confidence: 99%