Jana Hozzová scite author profile

In recent years, the heterogeneity of both commodity and supercomputers hardware has increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often key to improving speed and energy efficiency of highly-parallel codes. However, due to the complexity of heterogeneous architectures, optimization of codes for a certain type of architecture as well as porting codes across different architectures, while maintaining a comparable level of performance, can be extremely challenging. Addressing the challenges associated with performance optimization and performance portability, autotuning has gained a lot of interest. Autotuning of performance-relevant source-code parameters allows to automatically tune applications without hard coding optimizations and thus helps with keeping the performance portable. In this paper, we introduce a benchmark set of ten autotunable kernels for important computational problems implemented in OpenCL or CUDA. Using our Kernel Tuning Toolkit, we show that with autotuning most of the kernels reach near-peak performance on various GPUs and outperform baseline implementations on CPUs and Xeon Phis. Our evaluation also demonstrates that autotuning is key to performance portability. In addition to offline tuning, we also introduce dynamic autotuning of code optimization parameters during application runtime. With dynamic tuning, the Kernel Tuning Toolkit enables applications to re-tune performance-critical kernels at runtime whenever needed, for example, when input data changes. Although it is generally believed that autotuning spaces tend to be too large to be searched during application runtime, we show that it is not necessarily the case when tuning spaces are designed rationally. Many of our kernels reach near peak-performance with moderately sized tuning spaces that can be searched at runtime with acceptable overhead. Finally we demonstrate, how dynamic performance tuning can be integrated into a real-world application from cryo-electron microscopy domain.

show abstract

Using hardware performance counters to speed up autotuning convergence on GPUs

Filipovič

Hozzová

Nezarat

et al. 2022

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Exploiting Historical Data: Pruning Autotuning Spaces and Estimating the Number of Tuning Steps

Oľha

Hozzová

Fousek

et al. 2020

View full text Add to dashboard Cite

Autotuning, the practice of automatic tuning of applications to provide performance portability, has received increased attention in the research community, especially in high performance computing. Ensuring high performance on a variety of hardware usually means modifications to the code, often via different values of a selected set of parameters, such as tiling size, loop unrolling factor or data layout. However, the search space of all possible combinations of these parameters can be large, which can result in cases where the benefits of autotuning are outweighed by its cost, especially with dynamic tuning. Therefore, estimating the tuning time in advance or shortening the tuning time is very important in dynamic tuning applications.We have found that certain properties of tuning spaces do not vary much when hardware is changed. In this paper, we demonstrate that it is possible to use historical data to reliably predict the number of tuning steps that is necessary to find a wellperforming configuration, and to reduce the size of the tuning space. We evaluate our hypotheses on a number of HPC benchmarks written in CUDA and OpenCL, using several different generations of GPUs and CPUs.

show abstract

Property Map Collective Variable as a Useful Tool for a Force Field Correction

Trapl

Krupička

Vladimír

et al. 2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

The accuracy of biomolecular simulations depends on the accuracy of an empirical molecular mechanics potential known as a force field: a set of parameters and expressions to estimate the potential from atomic coordinates. Accurate parametrization of force fields for small organic molecules is a challenge due to their high diversity. One of the possible approaches is to apply a correction to the existing force fields. Here, we propose an approach to estimate the density functional theory (DFT)-derived force field correction which is calculated during the run of molecular dynamics without significantly affecting its speed. Using the formula known as a property map collective variable, we approximate the force field correction by a weighted average of this force field correction calculated only for a small series of reference structures. To validate this method, we used seven AMBER force fields, and we show how it is possible to convert one force field to behave like the other one. We also present the force field correction for the important anticancer drug Imatinib as a use case example. Our method appears to be suitable for adjusting the force field for general drug-like molecules. We provide a pipeline that generates the correction; this pipeline is available at .

show abstract

Simulation of Ligand Transport in Receptors Using CaverDock

Hozzová

Vávra

Bednář

et al. 2021

View full text Add to dashboard Cite

Using hardware performance counters to speed up autotuning convergence on GPUs

Filipovič¹,

Hozzová²,

Nezarat³

et al. 2021

Preprint

View full text Add to dashboard Cite

Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant sourcecode parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware.In this paper, we introduce a novel method for searching tuning spaces. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.

show abstract

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

et al. 2021

View full text Add to dashboard Cite

Improving ligand transport trajectory within flexible receptor in CaverDock

Němcová

Hozzová

Filipovič

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jana Hozzová

A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit

Using hardware performance counters to speed up autotuning convergence on GPUs

Exploiting Historical Data: Pruning Autotuning Spaces and Estimating the Number of Tuning Steps

Property Map Collective Variable as a Useful Tool for a Force Field Correction

Simulation of Ligand Transport in Receptors Using CaverDock

Using hardware performance counters to speed up autotuning convergence on GPUs

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

Improving ligand transport trajectory within flexible receptor in CaverDock

Contact Info

Product

Resources

About