The Case for Evaluating MapReduce Performance Using Workload Suites

Chen, Yanpei; Ganapathi, Archana; Griffith, Rean; Katz, Randy H.

doi:10.1109/mascots.2011.12

Cited by 326 publications

(259 citation statements)

References 10 publications

Supporting

Mentioning

245

Contrasting

Unclassified

Order By: Relevance

“…Chen et al describe MapReduce workloads from 6 months on a 600-machine cluster and 1.5 months on a 3000-machine cluster at Facebook, from 3 weeks on a cluster at Yahoo!, and from several other installations [125,124]. Some of their data is shown in Figures 9.34 and 9.35.…”

Section: End Boxmentioning

confidence: 99%

“…For example, consider the data about MapReduce workloads at Facebook available from the SWIM project [125]. MapReduce applications have two stages, a map stage and a reduce stage (this is explained in the box on page 498).…”

Section: Erroneous Datamentioning

confidence: 99%

“…Chen at al. have claimed that due to the complex nature of the MapReduce workloads and their many attributes, which typically do not conform to well-known distributions, using randomly mixed samples of traced data is better than using benchmarks or models [125].…”

Section: End Boxmentioning

confidence: 99%

“…Each file is the concatenation of 24 random one-hour samples from the original log file [125,124]. Available from GitHub: URL https://github.com/SWIMProjectUCB/SWIM/wiki.…”

Section: Facebook Mapreducementioning

confidence: 99%

See 3 more Smart Citations

Workload Modeling for Computer Systems Performance Evaluation

Feitelson

2015

191

131

View full text Add to dashboard Cite

Reliable performance evaluations require the use of representative workloads. This is no easy task since modern computer systems and their workloads are complex, with many interrelated attributes and complicated structures. Experts often use sophisticated mathematics to analyze and describe workload models, making these models difficult for practitioners to grasp. This book aims to close this gap by emphasizing the intuition and the reasoning behind the definitions and derivations related to the workload models. It provides numerous examples from real production systems, with hundreds of graphs. Using this book, readers will be able to analyze collected workload data and clean it if necessary, derive statistical models that include skewed marginal distributions and correlations, and consider the need for generative models and feedback from the system. The descriptive statistics techniques covered are also useful for other domains.

show abstract

Section: End Boxmentioning

confidence: 99%

Section: Erroneous Datamentioning

confidence: 99%

Section: End Boxmentioning

confidence: 99%

“…Each file is the concatenation of 24 random one-hour samples from the original log file [125,124]. Available from GitHub: URL https://github.com/SWIMProjectUCB/SWIM/wiki.…”

Section: Facebook Mapreducementioning

confidence: 99%

See 2 more Smart Citations

Workload Modeling for Computer Systems Performance Evaluation

Feitelson

2015

191

131

View full text Add to dashboard Cite

show abstract

“…To compare the resulting service they need a benchmark build around a set of representative analytical tasks. Most research in the area in done on actual MapReduce benchmarks like MRBench [28] or designing appropriate MapReduce workloads [29]. Pavlo et al [30] show how to have analytical workload run by both MapReduce and Distributed Databases and compare the results.…”

Section: High Workload Analytical Platformmentioning

confidence: 99%

Benchmarking in the Cloud: What It Should, Can, and Cannot Be

Folkerts

Alexandrov

Sachs

et al. 2013

Selected Topics in Performance Evaluation and Benchmarking

View full text Add to dashboard Cite

With the increasing adoption of Cloud Computing, we observe an increasing need for Cloud Benchmarks, in order to assess the performance of Cloud infrastructures and software stacks, to assist with provisioning decisions for Cloud users, and to compare Cloud offerings. We understand our paper as one of the first systematic approaches to the topic of Cloud Benchmarks. Our driving principle is that Cloud Benchmarks must consider end-to-end performance and pricing, taking into account that services are delivered over the Internet. This requirement yields new challenges for benchmarking and requires us to revisit existing benchmarking practices in order to adopt them to the Cloud.

show abstract