2009 IEEE International Symposium on Parallel &Amp; Distributed Processing 2009
DOI: 10.1109/ipdps.2009.5161054
|View full text |Cite
|
Sign up to set email alerts
|

A scalable auto-tuning framework for compiler optimization

Abstract: We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony's parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically select the one with the best-performing implementation. The resulting system achieves performance of compiler-generated code comparable to the fully automated version of the ATLAS library for the tested kernels. Perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
135
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 170 publications
(135 citation statements)
references
References 20 publications
(18 reference statements)
0
135
0
Order By: Relevance
“…Greedy Algorithm and Hill Climbing Cooper et al [21] use meta-heuristics to find the optimal compilation parameters while reducing the number of evaluations during search space exploration from 10000 to a single one using profiling data and estimated virtual execution Genetic Algorithm Sandrieser et al [88] obtain speedup of 23% for hyper-block formation Parallel Rank Ordering Tiwari and Hollingsworth [93] and Tiwari et al [92] use PRO for automatic tuning of compilation process and report 46% performance improvement compared to the original code [57,58,99] Determining the best partitioning strategy Assumes that for any two functions with similar features, the same partitioning strategy can be used [95] Determining loops that benefit from parallelization and their best scheduling policy Targets OpenMP loop constructs only. Uses profiling to detect loop candidates, which may significantly increase the compilation time [1,21] Adaptive tuning of the compilation process Profiling data needs to be collected to perform the virtual executions.…”
Section: Meta-heuristicsmentioning
confidence: 99%
“…Greedy Algorithm and Hill Climbing Cooper et al [21] use meta-heuristics to find the optimal compilation parameters while reducing the number of evaluations during search space exploration from 10000 to a single one using profiling data and estimated virtual execution Genetic Algorithm Sandrieser et al [88] obtain speedup of 23% for hyper-block formation Parallel Rank Ordering Tiwari and Hollingsworth [93] and Tiwari et al [92] use PRO for automatic tuning of compilation process and report 46% performance improvement compared to the original code [57,58,99] Determining the best partitioning strategy Assumes that for any two functions with similar features, the same partitioning strategy can be used [95] Determining loops that benefit from parallelization and their best scheduling policy Targets OpenMP loop constructs only. Uses profiling to detect loop candidates, which may significantly increase the compilation time [1,21] Adaptive tuning of the compilation process Profiling data needs to be collected to perform the virtual executions.…”
Section: Meta-heuristicsmentioning
confidence: 99%
“…This section demonstrates the integration of the analytical bounds with existing search optimization algorithms, the Nelder-Mead Simplex method [22] and the Parallel Rank Ordering (PRO) method [30]. In order to handle boundary constraints due to the DL/ML model, we used the extended version of the PRO algorithm introduced in the Active Harmony framework [32]. The same extension to handle boundaries was employed in our implementation of the Simplex method, and its stopping criteria are based on the work by Luersen [21].…”
Section: Search Space Reduction By Dl/ml Modelmentioning
confidence: 99%
“…In contrast, we have demonstrated (see Table 3) that the best performance is often realized only for rectangular tiles. Search-based techniques for finding tile sizes (and unroll factors) have received much attention in performance optimization [4,19,31,32,33]. The ATLAS system employs extensive empirical tuning to find the best tile sizes for different problem sizes in the BLAS library; tuning is done once at installation.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations