Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark

Gu, Lingjia; Li, Huan

doi:10.1109/hpcc.and.euc.2013.106

Cited by 89 publications

(71 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Recent research on Spark performance analysis mainly focuses on comparing it with similar kinds of distributed computing framework (e.g., MapReduce [4], Flink [5]) via running benchmarks or application programs [6]. These studies have different goals from our work, mainly for performance comparison to exhibit difference in various scenarios.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Optimal Resource Provisioning Approach based on Cost Modeling for Spark Applications in Public Clouds

Ruan

Zheng

Dong

2015

Proceedings of the Doctoral Symposium of the 16th International Middleware Conference

View full text Add to dashboard Cite

Efficient resource provisioning is required when running Spark applications in public clouds. However, how to optimize resource provisioning to minimize the time and/or monetary cost for a specific application remains an intractable problem since resource provisioning may differ from application to application and even be affected by the amount of input data. Existing resource settings heavily rely on random selection or previous deployer experience, frequently leading to low-quality resource provisioning. Therefore, there is an urgent need to propose an approach towards optimal resource provisioning for Spark applications in public clouds. This is a PhD proposal, where an approach based on time and monetary cost modeling is presented for cloud resource provisioning optimization under two typical constrained scenarios. The approach systematically drives resource provisioning for a specific Spark application, which may save a significant amount of time and money, compared to randomly selected settings.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Most of existing works evaluate Spark performance by comparing Spark with similar parallel computing frameworks (e.g., MapReduce [4], Flink [5]) via running benchmarks or application programs [6]. But there exists no research on building an analytical model of Spark framework or a time cost model for a specific Spark application.…”

Section: Introductionmentioning

confidence: 99%

Optimal Resource Provisioning Approach based on Cost Modeling for Spark Applications in Public Clouds

Ruan

Zheng

Dong

2015

Proceedings of the Doctoral Symposium of the 16th International Middleware Conference

View full text Add to dashboard Cite

show abstract

“…These JVMs behave independently of each other, which can have severe performance consequences. For example, it has been identified that the lack of coordination between JVMs regarding when to perform garbage collection results in significant performance slowdowns [51] in Apache Spark and Cassandra -often, a pause in one JVM to perform garbage collection propagates to the rest due to synchronization requirements, stalling the whole system. Finally, in latency-critical applications (e.g., web servers or databases), these idle intervals can cause requests to take unacceptably long times; and thus make a node's data unavailable.…”

Section: A Optimizing System Software and Language Managed Runtimesmentioning

confidence: 99%

ACTiCLOUD: Enabling the Next Generation of Cloud Applications

Goumas

Nikas

Lakew

et al. 2017

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

View full text Add to dashboard Cite

Abstract-Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy.ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings.

show abstract

“…Memory distribution dataset goes into operation in Spark, which improves the performance of iterative computation by caching data in memory [36]. Thus, Spark meets the requirements of the real-time taxi recommendation system for high timeliness and low latency [37]. In conclusion, our recommendation system uses Spark to deal with the raw GPS dataset.…”

Section: Calculation Frameworkmentioning

confidence: 99%

A Real-Time Taxicab Recommendation System Using Big Trajectories Data

Peng-peng

Gao

et al. 2017

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Carpooling is becoming a more and more significant traffic choice, because it can provide additional service options, ease traffic congestion, and reduce total vehicle exhaust emissions. Although some recommendation systems have proposed taxicab carpooling services recently, they cannot fully utilize and understand the known information and essence of carpooling. This study proposes a novel recommendation algorithm, which provides either a vacant or an occupied taxicab in response to a passenger's request, called VOT. VOT recommends the closest vacant taxicab to passengers. Otherwise, VOT infers destinations of occupied taxicabs by similarity comparison and clustering algorithms and then recommends the occupied taxicab heading to a close destination to passengers. Using an efficient large data-processing framework, Spark, we greatly improve the efficiency of large data processing. This study evaluates VOT with a real-world dataset that contains 14747 taxicabs' GPS data. Results show that the ratio of range (between forecasted and actual destinations) of less than 900 M can reach 90.29%. The total mileage to deliver all passengers is significantly reduced (47.84% on average). Specifically, the reduced total mileage of nonrush hours outperforms other systems by 35%. VOT and others have similar performances in actual detour ratio, even better in rush hours.

show abstract

Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark

Cited by 89 publications

References 8 publications

Optimal Resource Provisioning Approach based on Cost Modeling for Spark Applications in Public Clouds

Optimal Resource Provisioning Approach based on Cost Modeling for Spark Applications in Public Clouds

ACTiCLOUD: Enabling the Next Generation of Cloud Applications

A Real-Time Taxicab Recommendation System Using Big Trajectories Data

Contact Info

Product

Resources

About