The ANTAREX 1 project aims at expressing the application selfadaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). Created by The Institute of Electrical and Electronics Engineers (IEEE) for the benefit of humanity.
The use of reduced precision to improve performance metrics such as computation latency and power consumption is a common practice in the embedded systems field. This practice is emerging as a new trend in High Performance Computing (HPC), especially when new error-tolerant applications are considered. However, standard compiler frameworks do not support automated precision customization, and manual tuning and code transformation is the approach usually adopted in most domains. In recent years, research have been studying ways to improve the automation of this process. This article surveys this body of work, identifying the critical steps of this process, the most advanced tools available, and the open challenges in this research area. We conclude that, while several mature tools exist, there is still a gap to close, especially for tools based on static analysis rather than profiling, as well as for integration within mainstream, industry-strength compiler frameworks.
Architectures targeted at embedded systems often have limited floating point computation capabilities, and in many cases do not provide any hardware support. In this work, we propose a self-contained compiler transformation pass implemented within LLVM to perform floating point to fixed point conversion. This pass is used to optimize the scheduler of the MIOSIX 1 embedded real-time operating system. We compare the proposed approach with the original floating point implementation, a handtuned fixed point one, and a solution based on a C++ library for fixed-point arithmetic. Our solution achieves speedups with respect to original floating point implementation up to 3.1 ×.
Many classes of applications, both in the embedded and high performance domains, can trade off the accuracy of the computed results for computation performance. One way to achieve such a trade-off is precision tuning-that is, to modify the data types used for the computation by reducing the bit width, or by changing the representation from floating point to fixed point. We present a methodology for high-accuracy dynamic precision tuning based on the identification of input classes (i.e., classes of input datasets that benefit from similar optimizations). When a new input region is detected, the application kernels are re-compiled on the fly with the appropriate selection of parameters. In this way, we obtain a continuous optimization approach that enables the exploitation of the reduced precision computation while progressively exploring the solution space, thus reducing the time required by compilation overheads. We provide tools to support the automation of the runtime part of the solution, leaving to the user only the task of identifying the input classes. Our approach provides a significant performance boost (up to 320%) on the typical approximate computing benchmarks, without meaningfully affecting the accuracy of the result, since the error remains always below 3%.
Nowadays Heterogeneous System Architectures (HSAs) are becoming very attractive in the embedded and mobile markets thanks to the possibility to select the best computational resource among the available compute units to optimize the performance per Watt figure of merit. In this scenario, OpenCL is becoming the standard paradigm for heterogeneous computing supporting the programming of all types of units with a single abstraction level. However, the decision of the resource to use together with its architectural tuning is still left to the programmer; this issue is even more exacerbated when considering the fact that the choice depends also on the actual conditions in which the system is operating. This work aims at proposing a runtime controller, integrated in Linux Operating System (OS), for optimizing the power efficiency of a running OpenCL application deciding the system configuration. Our experimental results over a set of applications from the Polybench suite on the Odroid XU3 board show that our controller is able to obtain a power efficiency of more than 90% of the one achievable via offline profiling.
The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service (QoS) in an adaptive way. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. In this paper, we present an overview of the ANTAREX DSL and some ofits capabilities through a number of examples, including how the DSL is applied in the context of one of the project use cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.