Collective Mind: Towards Practical and Collaborative Auto-Tuning

Fursin, Grigori; Miceli, Renato; Lokhmotov, Anton; Gerndt, Michael; Baboulin, Marc; Malony, Allen D.; Chamski, Zbigniew; Novillo, Diego; Vento, Davide Del

doi:10.1155/2014/797348

Cited by 36 publications

(59 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among the contributions of this paper are: 1) We develop a simple, yet powerful profiling based analysis to capture data and control flow dependences for program executions with different input data sets, 2) we analyze the variability of both data and control flow dependences for the whole CBENCH benchmark suite [18], [19] using 100 randomly chosen input data sets from the KDATASETS [20] collection, and 3) we analyze the performance implications of the dynamically collected dependence information with respect to the ability to exploit loop-level parallelism and compare against static parallelization approaches.…”

Section: B Contributionsmentioning

confidence: 99%

“…We use the MIDATASETS/CBENCH benchmark suite [19], [22] for our evaluation. It contains 32 benchmarks from the MIBENCH [18] suite.…”

Section: Empirical Evaluation a Experimental Setupmentioning

confidence: 99%

“…For this we apply profile-guided data and control flow dependence analyses across the MIBENCH-derived CBENCH benchmark suite [18], [19] with 100 randomly chosen input data sets from the KDATASETS [20] collection for each benchmark. For each benchmark we analyze and characterize how different input sets cause variation in data and control flow patterns and how these variations in turn affect loop parallelization.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Variability of data dependences and control flow

Koch

Franke

2014

2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

View full text Add to dashboard Cite

Thread-level speculation and profile-guided parallelization techniques exploit the fact that many statically detected data and control flow dependences do not manifest themselves in every possible program execution. Instead, many of these maydependences only occur infrequently, e.g. for some corner cases, or not at all for any legal program input. While the effectiveness of dynamic parallelization techniques critically depends on the absence of such dependences, not much is known about their nature. In this paper, we present an empirical analysis and characterization of the variability of both data dependences and control flow across program runs. We run the CBENCH benchmark suite with 100 randomly chosen input data sets and record complete control and data flow traces. Based on these traces, we build a whole-program control and data flow graph (CDFG) for each run and compare the resulting graphs to obtain a measure of the variance in the observed control and data flow. We show that, on average, the cumulative profile information gathered with at least 55, and up to 100, different input data sets is needed to achieve full coverage of the data flow observed across all runs. For control flow, the figure stands at 46 and 100 data sets, respectively. This suggests that profile-guided parallelization needs to be applied with utmost care, as misclassification of sequential loops as parallel was observed even when up to 94 input data sets are used.

show abstract

Section: B Contributionsmentioning

confidence: 99%

“…We use the MIDATASETS/CBENCH benchmark suite [19], [22] for our evaluation. It contains 32 benchmarks from the MIBENCH [18] suite.…”

Section: Empirical Evaluation a Experimental Setupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Variability of data dependences and control flow

Koch

Franke

2014

2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

View full text Add to dashboard Cite

show abstract

“…Finally, G. Fursin et al [7] have proposed the notion of iterative compilation; they get rid of software characteristics altogether, and consider (among other) the performance of a program for a given set of optimization strategies in order to predict for the same program. While robust, this approach is obviously only relevant for large optimization spaces, which is not the case in our study.…”

Section: Related Work: Use Machine Learning To Improve Compilationmentioning

confidence: 99%

Guide Automatic Vectorization by means of Machine Learning: A Case Study of Tensor Contraction Kernels

Trouvé

Cruz

Murakami

et al. 2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYModern optimizing compilers tend to be conservative and often fail to vectorize programs that would have benefited from it. In this paper, we propose a way to predict the relevant command-line options of the compiler so that it chooses the most profitable vectorization strategy. Machine learning has proven to be a relevant approach for this matter: fed with features that describe the software to the compiler, a machine learning device is trained to predict an appropriate optimization strategy. The related work relies on the control and data flow graphs as software features. In this article, we consider tensor contraction programs, useful in various scientific simulations, especially chemistry. Depending on how they access the memory, different tensor contraction kernels may yield very different performance figures. However, they exhibit identical control and data flow graphs, making them completely out of reach of the related work. In this paper, we propose an original set of software features that capture the important properties of the tensor contraction kernels. Considering the Intel Merom processor architecture with the Intel Compiler, we model the problem as a classification problem and we solve it using a support vector machine. Our technique predicts the best suited vectorization options of the compiler with a cross-validation accuracy of 93.4%, leading to up to a 3-times speedup compared to the default behavior of the Intel Compiler. This article ends with an original qualitative discussion on the performance of software metrics by means of visualization. All our measurements are made available for the sake of reproducibility.

show abstract

“…Whenever a new program needs to be compiled, the model is queried to predict good configurations. Usually, this just aims at focusing the selection of the configurations to be tested through iterative compilation on more promising areas [Agakov et al 2006], but a similar approach can be used to predict a single configuration to be used as the result of the compilation [Fursin and Temam 2010]. Unfortunately, the training phase for building a good model is really long, up to several weeks [Fursin et al 2008], and this limits the applicability of machine learning approaches.…”

Section: Introductionmentioning

confidence: 99%

Continuous learning of compiler heuristics

Tartara

Reghizzi

2013

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Optimizing programs to exploit the underlying hardware architecture is an important task. Much research has been done on enabling compilers to find the best set of code optimizations that can build the fastest and less resource-hungry executable for a given program. A common approach is iterative compilation, sometimes enriched by machine learning techniques. This provides good results, but requires extremely long compilation times and an initial training phase lasting even for days or weeks.We present long-term learning, a new algorithm that allows the compiler user to improve the performance of compiled programs with reduced compilation times with respect to iterative compilation, and without an initial training phase. Our algorithm does not just build good programs: it acquires knowledge every time a program is compiled and it uses such knowledge to learn compiler heuristics, without the need for an expert to manually define them. The heuristics are evolved during every compilation, by evaluating their effect on the generated programs. We present implementations of long-term learning on top of two different compilers, and experimental data gathered on multiple hardware configurations showing its effectiveness.

show abstract

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Cited by 36 publications

References 73 publications

Variability of data dependences and control flow

Variability of data dependences and control flow

Guide Automatic Vectorization by means of Machine Learning: A Case Study of Tensor Contraction Kernels

Continuous learning of compiler heuristics

Contact Info

Product

Resources

About