2018
DOI: 10.1007/978-3-319-69953-0_5
|View full text |Cite
|
Sign up to set email alerts
|

On the Performance of Spark on HPC Systems: Towards a Complete Picture

Abstract: Big Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been increasingly used by many companies and research labs to facilitate large-scale data analysis. However, with the growing needs of users and size of data, commodity-based infrastructure will strain under the heavy weight of Big Data. On the other hand, HPC systems offer a rich set of opportunities for Big Data processing. As first steps toward Big Data processing on HPC systems, several research efforts have been devoted to understan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…Nonetheless, Big Data and HPC frameworks today remain largely incompatible: programming models and software development tools are inconsistent [5]; trying to mix both models out-of-the-box generates memory overheads and poor scalability in a HPC environment [6]; the disparity between collocated and distributed storage architectures in Big Data and HPC systems, respectively, degrades performance when running Big Data applications on HPC systems [7]; and the usage of merged Big Data models presents limitations, such as high memory consumption and low efficiency in communication between cooperating processes [8].…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, Big Data and HPC frameworks today remain largely incompatible: programming models and software development tools are inconsistent [5]; trying to mix both models out-of-the-box generates memory overheads and poor scalability in a HPC environment [6]; the disparity between collocated and distributed storage architectures in Big Data and HPC systems, respectively, degrades performance when running Big Data applications on HPC systems [7]; and the usage of merged Big Data models presents limitations, such as high memory consumption and low efficiency in communication between cooperating processes [8].…”
Section: Introductionmentioning
confidence: 99%