2022
DOI: 10.48550/arxiv.2202.12756
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scalable Multi-node Fast Fourier Transform on GPUs

Abstract: In this paper, we present the details of our multi-node GPU-FFT library, as well its scaling on Selene HPC system. Our library employs slab decomposition for data division and MPI for communication among GPUs. We performed GPU-FFT on 1024 3 , 2048 3 , and 4096 3 grids using a maximum of 512 A100 GPUs. We observed good scaling for 4096 3 grid with 64 to 512 GPUs. We report that the timings of multicore FFT of 1536 3 grid with 196608 cores of Cray XC40 is comparable to that of GPU-FFT of 2048 3 grid with 128 GPU… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 30 publications
(53 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?