Garbage collection auto-tuning for Java mapreduce on multi-cores

SingerJeremy,; KovoorGeorge,; BrownGavin,; LujánMikel,

doi:10.1145/2076022.1993495

Cited by 10 publications

(6 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to search-based algorithms, there have been several efforts to use machine learning for the purpose of automatic parameter tuning [26,47]. However, approaches using machine learning generally require a large amount of training data to be able to build a classifier with good accuracy because they can only learn about scenarios and configurations that have been seen in the past.…”

Section: Related Workmentioning

confidence: 99%

Towards automatic parameter tuning of stream processing systems

Bilal

Canini

2017

Proceedings of the 2017 Symposium on Cloud Computing

View full text Add to dashboard Cite

CitationBilal ABSTRACTOptimizing the performance of big-data streaming applications has become a daunting and time-consuming task: parameters may be tuned from a space of hundreds or even thousands of possible configurations. In this paper, we present a framework for automating parameter tuning for stream-processing systems. Our framework supports standard black-box optimization algorithms as well as a novel gray-box optimization algorithm. We demonstrate the multiple benefits of automated parameter tuning in optimizing three benchmark applications in Apache Storm. Our results show that a hill-climbing algorithm that uses a new heuristic sampling approach based on Latin Hypercube provides the best results. Our gray-box algorithm provides comparable results while being two to five times faster.

show abstract

Section: Related Workmentioning

confidence: 99%

Towards automatic parameter tuning of stream processing systems

Bilal

Canini

2017

Proceedings of the 2017 Symposium on Cloud Computing

View full text Add to dashboard Cite

show abstract

“…Most of the workloads have been used in popular data analysis workload suites such as BigDataBench [7], DCBench [6], HiBench [14] and Cloudsuite [5]. Phoenix++ [15], Phoenix rebirth [16] and Java MapReduce [17] tests the performance of devised sharedmemory frameworks based on Word Count, Grep and K-Means. We use Spark version of the selected benchmarks from BigDataBench and employ Big Data Generator Suite (BDGS), an open source tool, to generate synthetic datasets for every benchmark based on raw data sets [18].…”

Section: B Top-down Methods For Hardware Performance Countersmentioning

confidence: 99%

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

Awan

Brorsson

Vlassov

et al. 2015

2015 IEEE Fifth International Conference on Big Data and Cloud Computing

View full text Add to dashboard Cite

In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern inmemory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.

show abstract

“…Machine Learning Models: Machine learning approaches [13,14,19,29,49] employ various algorithms such as KCCA [6], artificial neural networks, decision trees, reinforcement learning, and Bayesian networks [39] to determine a correlation between configuration parameters and performance. The principal obstacle to the widespread adoption of machine learning techniques is the difficulty of the model building process.…”

Section: Related Workmentioning

confidence: 99%

“…Typical state of the practice in industry is to employ rules-of-thumb, and to rely on past experience to guess at relevant configuration parameters. More recent academic work has examined techniques based on domain-specific analytical cost models [5,24,25,26,61,62], hill climbing algorithms on customized frameworks [31], machine learning techniques [13,14,19,29,49], and genetic algorithms executed on the real application [32]. While these techniques are promising, they have not been widely deployed due to their inherent limitations.…”

Section: Introductionmentioning

confidence: 99%

Configuring Distributed Computations Using Response Surfaces

Gencer

Bindel

Sirer

et al. 2015

Proceedings of the 16th Annual Middleware Conference

View full text Add to dashboard Cite

Configuring large distributed computations is a challenging task. Efficiently executing distributed computations requires configuration tuning based on careful examination of application and hardware properties. Considering the large number of parameters and impracticality of using trial and error in a production environment, programmers tend to make these decisions based on their experience and rules of thumb. Such configurations can lead to underutilized and costly clusters, and missed deadlines. In this paper, we present a new methodology for determining desired hardware and software configuration parameters for distributed computations. The key insight behind this methodology is to build a response surface that captures how applications perform under different hardware and software configuration. Such a model can be built through iterated experiments using the real system, or, more efficiently, using a simulator. The resulting model can then generate recommendations for configuration parameters that are likely to yield the desired results even if they have not been tried either in simulation or in real-life. The process can be iterated to refine previous predictions and achieve better results. We have implemented this methodology in a configuration recommendation system for MapReduce 2.0 applications. Performance measurements show that representative applications achieve up to 5× performance improvement when they use the recommended configuration parameters compared to the default ones.

show abstract

Garbage collection auto-tuning for Java mapreduce on multi-cores

Cited by 10 publications

References 30 publications

Towards automatic parameter tuning of stream processing systems

Towards automatic parameter tuning of stream processing systems

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

Configuring Distributed Computations Using Response Surfaces

Contact Info

Product

Resources

About