2016 IEEE International Conference on Cluster Computing (CLUSTER) 2016
DOI: 10.1109/cluster.2016.22
|View full text |Cite
|
Sign up to set email alerts
|

Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks

Abstract: Abstract-Big Data analytics has recently gained increasing popularity as a tool to process large amounts of data on-demand. Spark and Flink are two Apache-hosted data analytics frameworks that facilitate the development of multi-step data pipelines using directly acyclic graph patterns. Making the most out of these frameworks is challenging because efficient executions strongly rely on complex parameter configurations and on an in-depth understanding of the underlying architectural choices. Although extensive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
0
2

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 64 publications
(50 citation statements)
references
References 17 publications
0
48
0
2
Order By: Relevance
“…The Big-DataBench [8] suite contains 19 scenarios covering a broad range of applications and diverse data sets. Marcu et al [9] performed an extensive analysis of the differences between Apache Spark and Apache Flink on iterative workloads. The above benchmarks either adopt batch processing systems and metrics used in batch processing systems or apply the batchbased metrics on SDPSs.…”
Section: Related Workmentioning
confidence: 99%
“…The Big-DataBench [8] suite contains 19 scenarios covering a broad range of applications and diverse data sets. Marcu et al [9] performed an extensive analysis of the differences between Apache Spark and Apache Flink on iterative workloads. The above benchmarks either adopt batch processing systems and metrics used in batch processing systems or apply the batchbased metrics on SDPSs.…”
Section: Related Workmentioning
confidence: 99%
“…Apache Hadoop has been used in various big data processing fields but cannot meet the real-time computing tasks and requirements [44,45]. Apache Storm only supports stream processing [46], Apache Spark simulates stream processing based on batch processing [47], and Apache Flink is entirely based on stream processing and simulates batch processing through stream processing [48]. Apache Flink can implement both stream processing and batch processing via a single solution, which can help prevent duplication of codes during development.…”
Section: Batch and Stream Computingmentioning
confidence: 99%
“…We mention that most of the above presented surveys are limited in terms of both the evaluated features of Big Data frameworks and the number of considered frameworks. For example, in [64], only stream processing frameworks are considered while in [16] [54] [24] [40], only batch processing frameworks are considered. We highlight that our experimental survey differs from the above presented works by the fact that it compares the studied frameworks in the case of both batch and stream processing.…”
Section: Related Workmentioning
confidence: 99%