Proceedings of the 26th ACM International Conference on Supercomputing 2012
DOI: 10.1145/2304576.2304604
|View full text |Cite
|
Sign up to set email alerts
|

On the communication complexity of 3D FFTs and its implications for Exascale

Abstract: This paper revisits the communication complexity of largescale 3D fast Fourier transforms (FFTs) and asks what impact trends in current architectures will have on FFT performance at exascale. We analyze both memory hierarchy traffic and network communication to derive suitable analytical models, which we calibrate against current software implementations; we then evaluate models to make predictions about potential scaling outcomes at exascale, based on extrapolating current technology trends. Of particular int… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
42
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 50 publications
(44 citation statements)
references
References 36 publications
(30 reference statements)
2
42
0
Order By: Relevance
“…Pippig [14] reported a comparative study of FFTW [15], PFFT [14], and P3DFFT [16] using an IBM Blue Gene machine. Similar investigations using Intel-Infiniband clusters were reported in [13,17,18]. In general, the majority of the execution time is spent in communication.…”
Section: Introductionsupporting
confidence: 80%
“…Pippig [14] reported a comparative study of FFTW [15], PFFT [14], and P3DFFT [16] using an IBM Blue Gene machine. Similar investigations using Intel-Infiniband clusters were reported in [13,17,18]. In general, the majority of the execution time is spent in communication.…”
Section: Introductionsupporting
confidence: 80%
“…) cache misses, for each transferring line of size L [13,14]. This bound is optimal, matching the lower bound by Hong and Kung [15] when 3 √ N is an exact power of two.…”
Section: Memory Access Costssupporting
confidence: 71%
“…We use the AccFFT package [45]-a parallel, open-source FFT library for CPU/GPU architectures developed in our group, to apply the spectral operators. AccFFT dictates the data layout: We partition the data based on a pencil decomposition for 3D FFTs [35,50]: Let n p = p 1 p 2 denote the number of MPI tasks; then each MPI task gets (n 1 /p 1 ) × (n 2 /p 2 ) × n 3 grid points (i.e., we partition the domain Ω h along the x 1 -and x 2 -axis into subdomains Ω h i , i = 1, . .…”
Section: Newtonmentioning
confidence: 99%