2017 46th International Conference on Parallel Processing Workshops (ICPPW) 2017
DOI: 10.1109/icppw.2017.44
|View full text |Cite
|
Sign up to set email alerts
|

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Abstract: Abstract-In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping computation and communication and, second, decreasing resource contention for higher compute effici… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(9 citation statements)
references
References 6 publications
0
9
0
Order By: Relevance
“…Wagner et al 63 cache hit rate is already high. However, for MF-SGD, using both AVX-512 and MCDRAM boosts performance significantly, as AVX-512 instructions produce many memory accesses in each cycle which benefit from high bandwidth of MCDRAM.…”
Section: High-performance Computingmentioning
confidence: 99%
See 3 more Smart Citations
“…Wagner et al 63 cache hit rate is already high. However, for MF-SGD, using both AVX-512 and MCDRAM boosts performance significantly, as AVX-512 instructions produce many memory accesses in each cycle which benefit from high bandwidth of MCDRAM.…”
Section: High-performance Computingmentioning
confidence: 99%
“…Table 5 shows the parallel-programming languages used by different works. In OpenMP, both static 52,54,63,79 and dynamic 54,63,79 scheduling schemes have been used. Also, some works use OpenMP task pragmas for parallelization.…”
Section: Hou Et Al 88 Present a Technique For Automatically Generatinmentioning
confidence: 99%
See 2 more Smart Citations
“…It consists of various infrastructure nodes connecte high-performance interconnect, including computing nodes, storage systems, nodes, management nodes, datemover (DM) nodes, and web servers. The computing nodes that perform parallel processes at high speeds consis 8305 Intel many-core processor Knights Landing Nodes (KNLs) [10,11] and a 132 server processor Skylake Nodes (SKLs) [12]. Each KNL node has 68 cores per sock 96 GB (16 GB × 6) memory, while each SKL node has two sockets, each of which cores and 192 GB (16 GB × 12) memory.…”
Section: Hardware Configurationmentioning
confidence: 99%