Automatic performance tuning (auto-tuning)
IntroductionAs multicore platforms become ubiquitous, many software applications have to be parallelized and tuned for performance. In the past one could afford to optimize code by hand for certain parallel machines. Manual tuning must be automated in the multicore world with mass markets for parallel computers. The reasons are manifold: the user community has grown significantly, just as the diversity of application areas for parallelism. In addition, the available parallel platforms differ in many respects, e.g., in number or type of cores, number of simultaneously executing hardware threads, cache architecture, available memory, or employed operating system. Thus, the number of targets to optimize for has exploded. Even worse, optimizations made for a certain machine may cause a slowdown on another machine.At the same time, multicore software has to remain portable and easy to maintain, which means that hardwired code optimizations must be avoided. Libraries with already tuned code bring only small improvements, as the focus of optimization is often narrowed down to specific problems or algorithms [11]. Moreover, libraries are highly platform-specific, and require interfaces to be agreed upon. To achieve good overall performance, there seems to be no way around adapting the whole software architecture of a parallel program to the target architecture.Automatic performance tuning (auto-tuning) [5], [10], [19] is a promising systematic approach in which parallel programs are written in a generic and portable way, while their performance remains comparable to that of manual optimization.In this paper, we focus on the problem how to connect an auto-tuner to a parallel application. We introduce Atune-IL, a general instrumentation language that is used throughout the development of a parallel program to define tunable parameters. Our tuning instrumentation language is based on language-independent #pragma annotations that are inserted into the code of an existing parallel application. Atune-IL has powerful features that go far beyond related work in numerics [5], [19], [14]. Our approach is aimed to improve the software engineering of general-purpose parallel applications; it provides constructs to specify tunable variables, add meta-information on nested parallelism (to allow optimization on several abstraction layers), and vary the program architecture. All presented features are fully functional and have been positively evaluated in the context of a large commercial application analyzing biological data on an eight-core machine. With our approach, we were able to reduce the code size required for instrumentation by 96%, and the auto-tuner's search space by 99%.The paper is organized as follows. Section 2 provides essential background knowledge on auto-tuning general purpose parallel applications. Section 3 introduces Atune-IL, our tuning instrumentation language. Section 4 shows how program variants are generated automatically for tuning iterations. The mechan...