The MapReduce programming paradigm proved to be a useful approach for building highly scalable data processing systems. One important reason for its success is simplicity, including the fault tolerance mechanisms. However, this simplicity comes at a price: efficiency. MapReduce's fault tolerance scheme stores too much intermediate information on disk. This inefficiency negatively affects job completion time. Furthermore, this inefficiency in particular forbids the application of MapReduce in near real-time scenarios where jobs need to produce results quickly. In this paper, we discuss an alternative fault tolerance scheme that is inspired by virtual synchrony. The key feature of our approach is a low-overhead deterministic execution. Deterministic execution reduces the amount of persistently stored information. In addition, because persisting intermediate results are no longer required for fault tolerance, we use more efficient communication techniques that considerably improve job completion time and throughput. Our contribution is twofold: (i) we enable the use of MapReduce for jobs ranging from seconds to a few tens of seconds, satisfying these deadlines even in the case of failures; (ii) we considerably reduce the fault tolerance overhead and as such the overhead of MapReduce in general. Our modifications are transparent to the application.
Abstract. Replicated services often rely on a leader to order client requests and broadcast state updates. In this work, we present POLE, a leader election algorithm that select leaders using application-specific scores. This flexibility given to the application enables the algorithm to tailor leader election according to metrics that are relevant in practical settings and that have been overlooked by existing approaches. Recovery time and request latency are examples of such metrics. To evaluate POLE, we use ZooKeeper, an open-source replicated service used for coordinating Web-scale applications. Our evaluation over realistic widearea settings shows that application scores can have a significant impact on performance, and that just optimizing the latency of consensus does not translate into lower latency for clients. An important conclusion from our results is that obtaining a general strategy that satisfies a wide range of requirements is difficult, which implies that configurability is indispensable for practical leader election.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đŸ’™ for researchers
Part of the Research Solutions Family.