Design considerations for GPU‐aware collective communications in MPI

Faraji, Iman; Afsahi, Ahmad

doi:10.1002/cpe.4667

Cited by 6 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, this selection includes four interesting works . Two of them were contributions from the last two workshop editions (HUCAA 2015 and 2016, both collocated with the International Conference on Parallel Processing ‐ ICPP'15 and ICPP'16).…”

Section: Themes Of This Special Issuementioning

confidence: 99%

Heterogeneous and unconventional cluster architectures and applications

Fröning

Silla

2018

Concurrency and Computation

View full text Add to dashboard Cite

Section: Themes Of This Special Issuementioning

confidence: 99%

Heterogeneous and unconventional cluster architectures and applications

Fröning

Silla

2018

Concurrency and Computation

View full text Add to dashboard Cite

“…Usually, scientists directly use the way of MPI parallelism or GPU parallelism. However, MPI and GPU can be combined and used for large-scale computing tasks in many fields, and the MPI-GPU heterogeneous way has been widely used [29][30][31]. In computational fluid dynamics, Choi et al used a floating-point compression algorithm to optimize the GPU memory capacity in the heterogeneous MPI-GPU implementation [32], and Lai et al developed a heterogeneous parallel program combining MPI and CUDA for CFD applications on high-performance computing clusters to greatly improve computational efficiency [33].…”

Section: Introductionmentioning

confidence: 99%

Molecular Docking for Ligand-Receptor Binding Process Based on Heterogeneous Computing

Liu

Zhen

et al. 2022

Scientific Programming

View full text Add to dashboard Cite

Molecular docking aims to predict possible drug candidates for many diseases, and it is computationally intensive. Particularly, in simulating the ligand-receptor binding process, the binding pocket of the receptor is divided into subcubes, and when the ligand is docked into all cubes, there are many molecular docking tasks, which are extremely time-consuming. In this study, we propose a heterogeneous parallel scheme of molecular docking for the binding process of ligand to receptor to accelerate simulating. The parallel scheme includes two layers of parallelism, a coarse-grained layer of parallelism implemented in the message-passing interface (MPI) and a fine-grained layer of parallelism focused on the graphics processing unit (GPU). At the coarse-grain layer of parallelism, a docking task inside one lattice is assigned to one unique MPI process, and a grouped master-slave mode is used to allocate and schedule the tasks. Meanwhile, at the fine-gained layer of parallelism, GPU accelerators undertake the computationally intensive computing of scoring functions and related conformation spatial transformations in a single docking task. The results of the experiments for the ligand-receptor binding process show that on a multicore server with GPUs the parallel program has achieved a speedup ratio as high as 45 times in flexible docking and as high as 54.5 times in semiflexible docking, and on a distributed memory system, the docking time for flexible docking and that for semiflexible docking gradually decrease as the number of nodes used in the parallel program gradually increases. The scalability of the parallel program is also verified in multiple nodes on a distributed memory system and is approximately linear.

show abstract