Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance 2009
DOI: 10.1145/1552272.1552278
|View full text |Cite
|
Sign up to set email alerts
|

Using realistic simulation for performance analysis of mapreduce setups

Abstract: Recently, there has been a huge growth in the amount of data processed by enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: First, the emergence of cloud computing, which provides transparent access to a large number of compute, storage and networking resources; and second, the development of the MapReduce programming model, which provides a highlevel abstraction for data-intensive computing. However, the de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
42
0
1

Year Published

2012
2012
2020
2020

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 73 publications
(43 citation statements)
references
References 8 publications
0
42
0
1
Order By: Relevance
“…We replicated the methodology used in recent work [10], using the NS-2 packet-level network simulator [16], so we are able to demonstrate the robustness of our findings. Therefore, the NS-2 simulator has been extend with DCTCP [17] implementation and is driven by the MRPerf MapReduce simulator [18].…”
Section: Methodsmentioning
confidence: 99%
“…We replicated the methodology used in recent work [10], using the NS-2 packet-level network simulator [16], so we are able to demonstrate the robustness of our findings. Therefore, the NS-2 simulator has been extend with DCTCP [17] implementation and is driven by the MRPerf MapReduce simulator [18].…”
Section: Methodsmentioning
confidence: 99%
“…This simulator has been extended with a model of Energy Efficient Ethernet [32], which has been previously validated [8] and used extensively in previous work [33]. The network simulator is driven by the MRPerf MapReduce simulator [34]. We could not use real hardware because the EEE control algorithm is implemented in NIC and switch firmware, and no hardware was available for which we were able to change the packet coalescing settings.…”
Section: A Simulation Environment and Workloadsmentioning
confidence: 99%
“…In other words, the performance for such workloads is bounded by disk latency (time) and throughput (operations/ second). Hadoop is based on Google's map-reduce [2] [3] in which a given workload task is broken into small tasks and these tasks are distributed to be processed on different nodes in a cloud cluster. The resource utilization for these benchmarks is categorized into three categories as shown in Table 1.…”
Section: Introductionmentioning
confidence: 99%