2013 IEEE 10th International Conference on High Performance Computing and Communications &Amp; 2013 IEEE International Conferen 2013
DOI: 10.1109/hpcc.and.euc.2013.106
|View full text |Cite
|
Sign up to set email alerts
|

Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
68
1
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 89 publications
(71 citation statements)
references
References 8 publications
1
68
1
1
Order By: Relevance
“…Recent research on Spark performance analysis mainly focuses on comparing it with similar kinds of distributed computing framework (e.g., MapReduce [4], Flink [5]) via running benchmarks or application programs [6]. These studies have different goals from our work, mainly for performance comparison to exhibit difference in various scenarios.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent research on Spark performance analysis mainly focuses on comparing it with similar kinds of distributed computing framework (e.g., MapReduce [4], Flink [5]) via running benchmarks or application programs [6]. These studies have different goals from our work, mainly for performance comparison to exhibit difference in various scenarios.…”
Section: Related Workmentioning
confidence: 99%
“…Most of existing works evaluate Spark performance by comparing Spark with similar parallel computing frameworks (e.g., MapReduce [4], Flink [5]) via running benchmarks or application programs [6]. But there exists no research on building an analytical model of Spark framework or a time cost model for a specific Spark application.…”
Section: Introductionmentioning
confidence: 99%
“…These JVMs behave independently of each other, which can have severe performance consequences. For example, it has been identified that the lack of coordination between JVMs regarding when to perform garbage collection results in significant performance slowdowns [51] in Apache Spark and Cassandra -often, a pause in one JVM to perform garbage collection propagates to the rest due to synchronization requirements, stalling the whole system. Finally, in latency-critical applications (e.g., web servers or databases), these idle intervals can cause requests to take unacceptably long times; and thus make a node's data unavailable.…”
Section: A Optimizing System Software and Language Managed Runtimesmentioning
confidence: 99%
“…Memory distribution dataset goes into operation in Spark, which improves the performance of iterative computation by caching data in memory [36]. Thus, Spark meets the requirements of the real-time taxi recommendation system for high timeliness and low latency [37]. In conclusion, our recommendation system uses Spark to deal with the raw GPS dataset.…”
Section: Calculation Frameworkmentioning
confidence: 99%