Balanced resource allocations across multiple dynamic MapReduce clusters

Ghit, Bogdan; Yigitbasi, Nezih; Iosup, Alexandru; Epema, Dick

doi:10.1145/2591971.2591998

Cited by 22 publications

(10 citation statements)

References 32 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…During the past decade, performance of MapReduce became a rich exploration domain, leading to several papers focusing on diverse aspects of MapReduce scheduling: data locality [31], stragglers [5], [6], resource heterogeneity [33], or elastic scaling [12], [13], [21]. State-of-the art schedulers for MapReducebased systems assume they have complete control over a fixed set of resources, thus they are typically deployed on dedicated clusters of machines.…”

Section: E Improvements From Tyrexmentioning

confidence: 99%

“…TYREX uses resource partitioning and work-conserving job migration across these partitions as its two main principles. A common way of partitioning the resources of a datacenter is to allocate disjoint sets of machines to multiple instances of the MapReduce framework [12]. However, this scheduling model is not attractive for jobs that are moved across partitions but still require access to the same data, as the cost of replicating the data across partitions may be prohibitive.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Ghit

Epema

2016

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Self Cite

View full text Add to dashboard Cite

Abstract-Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves nearoptimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95 th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.

show abstract

Section: E Improvements From Tyrexmentioning

confidence: 99%

mentioning

confidence: 99%

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Ghit

Epema

2016

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the policy does not consider cluster shrink requests. Ghit et al [155] extended the above-mentioned policies by accounting dynamic demand (job, data, and task), dynamic usage (processor, disk, and memory), and actual performance (job slowdown, job throughput, and task throughput) analysis when resizing a MapReduce cluster.…”

Section: Resource Allocation Mechanisms For Geo-distributed Systemsmentioning

confidence: 99%

A Survey on Geographically Distributed Big-Data Processing Using MapReduce

Dolev

Florissi

Gudes

et al. 2019

IEEE Trans. Big Data

View full text Add to dashboard Cite

Abstract-Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.

show abstract

“…The highlighted components cover the minimum set of layers necessary for execution for the MapReduce ecosystem; the presence of several high-level languages indicates that the ecosystem has diverse users, with minimal expertise and ability in managing the ecosystem beyond the high-level language they know. This reference architecture was useful to our research, design, and engineering: with it as a guide, we have created the Fawkes elastic MapReduce system [94].…”

Section: Datacenters: Designing the Digital Factorymentioning

confidence: 99%

The AtLarge Vision on the Design of Distributed Systems and Ecosystems

Iosup

Versluis

Trivedi

et al. 2019

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

Self Cite

View full text Add to dashboard Cite

High-quality designs of distributed systems and services are essential for our digital economy and society. Threatening to slow down the stream of working designs, we identify the mounting pressure of scale and complexity of (eco-)system, of ill-defined and wicked problems, and of unclear processes, methods, and tools. We envision design itself as a core research topic in distributed systems, to understand and improve the science and practice of distributed (eco-)system design. Toward this vision, we propose the ATLARGE design framework, accompanied by a set of 8 core design principles. We also propose 10 key challenges, which we hope the community can address in the following 5 years. In our experience so far, the proposed framework and principles are practical, and lead to pragmatic and innovative designs for large-scale distributed systems. arXiv:1902.05416v1 [cs.DC] 14 Feb 2019 2. Why Focus on MCS Design?We argue in this section for the timely and important need to focus on MCS design. Not only is (good) design needed (Section 2.1), but we identify an increasing need for good design (Section 2.2) and designers (Section 2.3).We also analyze what good design needs to address, that is, complex challenges from system design (Section 2.4) and from MCS design (Section 2.5).3. We anonymize the venue, but consider it relevant because its held year is after 2014, the venue is a conference, and its ranking is A in CORE18 and green in MSAR14. For comparison, ICDCS has these rankings too.4. We anonymize the university, but consider the course relevant because it is large, it took place after 2014, and the university is ranked in the top-150 (in computer science) in both the THE and the QS 2018 World University Rankings (out of nearly 1,000 universities), and in Webometrics of July 18 (out of over 28,000).

show abstract

Balanced resource allocations across multiple dynamic MapReduce clusters

Cited by 22 publications

References 32 publications

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks

A Survey on Geographically Distributed Big-Data Processing Using MapReduce

The AtLarge Vision on the Design of Distributed Systems and Ecosystems

Contact Info

Product

Resources

About