Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion 2016
DOI: 10.1145/2908961.2931692
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction

Abstract: Abstract. Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of "deforestation', which eli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…GI is often applied to non-functional properties of software but perhaps it is most famous for improving program's functionality, e.g. by removing bugs [8,9,10,11,12,13,14,15,16] or adding to its abilities [17,18,19,20,21,22]. Non-functional improvements that have been considered or results reported include: faster code [23,24], code which uses less energy [25,26,27,28,29,30,31,32,33,34] or less memory [35], and automatic parallelisation [36,37,38] and automatic porting [39] and embedded systems [40,41,25,42,43,44,45] as well as refactorisation [46], reverse engineering [47,48] and software product lines [49,50].…”
Section: Genetic Improvementmentioning
confidence: 99%
“…GI is often applied to non-functional properties of software but perhaps it is most famous for improving program's functionality, e.g. by removing bugs [8,9,10,11,12,13,14,15,16] or adding to its abilities [17,18,19,20,21,22]. Non-functional improvements that have been considered or results reported include: faster code [23,24], code which uses less energy [25,26,27,28,29,30,31,32,33,34] or less memory [35], and automatic parallelisation [36,37,38] and automatic porting [39] and embedded systems [40,41,25,42,43,44,45] as well as refactorisation [46], reverse engineering [47,48] and software product lines [49,50].…”
Section: Genetic Improvementmentioning
confidence: 99%
“…Beginning with early work on compiler optimisation [3], there is an extensive body of work applying semantics-preserving transformations to improve the nonfunctional properties (NFPs) of software. Recent work in this area includes Kocsis et al [16], which yield a 10,000-fold speedup of database queries on terabyte datasets within the Apache Spark analytics framework by eliminating redundant database joins and other transformations. Kocsis et al also automatically repaired 451 systematic errors in the implementation of the Apache Hadoop HPC framework [17], whilst simultaneously significantly improving performance.…”
Section: Threats To Validitymentioning
confidence: 99%
“…Our previous work on improvement of human-written programs that is semantics-preserving includes: a 10,000-fold speedup of database queries on terabyte datasets within the Apache Spark analytics framework [ 6 ]; automatic generation of an energy-efficient version of the Quicksort algorithm (consuming half the power of the popular ‘median of first, mid, last’ method due to Sedgewick) on pathological input distributions [ 7 ]; a 24% improvement in energy consumption by optimizing a single widely-used class in Google’s Guava collection library [ 8 ]; automatic repair of over 400 systematic errors in the implementation of the Apache Hadoop analytics framework whilst simultaneously significantly improving performance [ 9 ]; automated speedup of concurrent versions of divide-and-conquer algorithms (Quicksort, Strassen matrix multiplication and the FFT) [ 10 ]. However, each of these approaches require experiments to be explicitly framed (in a separate manner for each technique) in terms of semantics-preserving operations.…”
Section: Introductionmentioning
confidence: 99%