2015 IEEE International Conference on Big Data (Big Data) 2015
DOI: 10.1109/bigdata.2015.7363768
|View full text |Cite
|
Sign up to set email alerts
|

Spark deployment and performance evaluation on the MareNostrum supercomputer

Abstract: Abstract-In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, stor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
35
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 22 publications
(42 citation statements)
references
References 12 publications
0
35
0
Order By: Relevance
“…In this work, we experiment with the MareNostrum petascale supercomputer at the Barcelona Supercomputing Center in Spain. After configuring the cluster in an application-independent way according to the results in [8], we examine the impact of configurable parameters on a range of applications and derive a simple trial-and-error tuning methodology that can be applied to each Spark application separately. We test our methodology using three case studies with particularly encouraging results.…”
Section: Introductionmentioning
confidence: 99%
“…In this work, we experiment with the MareNostrum petascale supercomputer at the Barcelona Supercomputing Center in Spain. After configuring the cluster in an application-independent way according to the results in [8], we examine the impact of configurable parameters on a range of applications and derive a simple trial-and-error tuning methodology that can be applied to each Spark application separately. We test our methodology using three case studies with particularly encouraging results.…”
Section: Introductionmentioning
confidence: 99%
“…5 The profiles for these five steps and the potential repartitioning are shown in Figure 7 when the flow is executed on a cluster from 4 to 16 nodes on MN3.…”
Section: Real Case-studymentioning
confidence: 99%
“…To avoid imbalanced execution, the degree of partitioning must be equal to the number of cores multiplied by a small integer. However, based on (i) the evidence in [5] that, for CPU-intensive applications on MN3, the most efficient configuration of the degree of partitioning is to be set equal to the number of cores, and (ii) the evidence in [6], where the main performance bottlenecks are the CPU ones, in this work, we always set the degree of partitioning to the number of cores. Also, we allow the usage of complete machines, each consisting of 16 cores.…”
Section: Our Setting and The Benchmarking Applicationsmentioning
confidence: 99%
See 2 more Smart Citations