Hermes Senger scite author profile

Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency.In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.

show abstract

Comparing Distributed Online Stream Processing Systems Considering Fault Tolerance Issues

Gradvohl¹,

Senger²,

Arantes³

et al. 2014

JETWI

View full text Add to dashboard Cite

This paper presents an analysis of four online stream processing systems (MillWheel, S4, Spark Streaming and Storm) regarding the strategies they use for fault tolerance. We use this sort of system for processing of data streams that can come from different sources such as web sites, sensors, mobile phones or any set of devices that provide real-time high-speed data. Typically, these systems are concerned more with the throughput in data processing than on fault tolerance. However, depending on the type of application, we should consider fault tolerance as an important a feature. The work describes some of the main strategies for fault tolerance-replication components, upstream backup, checkpoint and recovery-and shows how each of the four systems uses these strategies. In the end, the paper discusses the advantages and disadvantages of the combination of the strategies for fault tolerance in these systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hermes Senger

Improving scalability of Bag-of-Tasks applications running on master–slave platforms

Scalability limits of Bag-of-Tasks applications running on hierarchical platforms

BSP cost and scalability analysis for MapReduce operations

Comparing Distributed Online Stream Processing Systems Considering Fault Tolerance Issues

Contact Info

Product

Resources

About