The ANTAREX 1 project aims at expressing the application selfadaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). Created by The Institute of Electrical and Electronics Engineers (IEEE) for the benefit of humanity.
The use of reduced precision to improve performance metrics such as computation latency and power consumption is a common practice in the embedded systems field. This practice is emerging as a new trend in High Performance Computing (HPC), especially when new error-tolerant applications are considered. However, standard compiler frameworks do not support automated precision customization, and manual tuning and code transformation is the approach usually adopted in most domains. In recent years, research have been studying ways to improve the automation of this process. This article surveys this body of work, identifying the critical steps of this process, the most advanced tools available, and the open challenges in this research area. We conclude that, while several mature tools exist, there is still a gap to close, especially for tools based on static analysis rather than profiling, as well as for integration within mainstream, industry-strength compiler frameworks.
Architectures targeted at embedded systems often have limited floating point computation capabilities, and in many cases do not provide any hardware support. In this work, we propose a self-contained compiler transformation pass implemented within LLVM to perform floating point to fixed point conversion. This pass is used to optimize the scheduler of the MIOSIX 1 embedded real-time operating system. We compare the proposed approach with the original floating point implementation, a handtuned fixed point one, and a solution based on a C++ library for fixed-point arithmetic. Our solution achieves speedups with respect to original floating point implementation up to 3.1 ×.
Many classes of applications, both in the embedded and high performance domains, can trade off the accuracy of the computed results for computation performance. One way to achieve such a trade-off is precision tuning-that is, to modify the data types used for the computation by reducing the bit width, or by changing the representation from floating point to fixed point. We present a methodology for high-accuracy dynamic precision tuning based on the identification of input classes (i.e., classes of input datasets that benefit from similar optimizations). When a new input region is detected, the application kernels are re-compiled on the fly with the appropriate selection of parameters. In this way, we obtain a continuous optimization approach that enables the exploitation of the reduced precision computation while progressively exploring the solution space, thus reducing the time required by compilation overheads. We provide tools to support the automation of the runtime part of the solution, leaving to the user only the task of identifying the input classes. Our approach provides a significant performance boost (up to 320%) on the typical approximate computing benchmarks, without meaningfully affecting the accuracy of the result, since the error remains always below 3%.
The drug discovery process involves several tasks to be performed in vivo, in vitro and in silico. Molecular docking is a task typically performed in silico. It aims at finding the three-dimensional pose of a given molecule when it interacts with the target protein binding site. This task is often done for virtual screening a huge set of molecules to find the most promising ones, which will be forwarded to the later stages of the drug discovery process. Given the huge complexity of the problem, molecular docking cannot be solved by exploring the entire space of the ligand poses. State-of-the-art approaches face the problem by sampling the space of the ligand poses to generate results in a reasonable time budget. In this work, we improve the geometric approach to molecular docking by introducing tunable approximations. In particular, we analyzed and enriched the original implementation with tunable software knobs to explore and control the performance-accuracy tradeoffs. We modeled time-to-solution of the virtual screening task as a function of software knobs, input data features, and available computational resources. Therefore, the application can autotune its configuration according to a user-defined time budget. We used a Mini-App derived by LiGenDock -a state-of-the-art molecular docking application -to validate the proposed approach. We run the enhanced Mini-App on an HPC system by using a very large database of pockets and ligands. The proposed approach exposes a time-to-solution interval spanning more than one order of magnitude with accuracy degradation up to 30%, more in general providing different accuracy levels according to the needs of the virtual screening campaign.
Nowadays Heterogeneous System Architectures (HSAs) are becoming very attractive in the embedded and mobile markets thanks to the possibility to select the best computational resource among the available compute units to optimize the performance per Watt figure of merit. In this scenario, OpenCL is becoming the standard paradigm for heterogeneous computing supporting the programming of all types of units with a single abstraction level. However, the decision of the resource to use together with its architectural tuning is still left to the programmer; this issue is even more exacerbated when considering the fact that the choice depends also on the actual conditions in which the system is operating. This work aims at proposing a runtime controller, integrated in Linux Operating System (OS), for optimizing the power efficiency of a running OpenCL application deciding the system configuration. Our experimental results over a set of applications from the Polybench suite on the Odroid XU3 board show that our controller is able to obtain a power efficiency of more than 90% of the one achievable via offline profiling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.