Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Wagner, Michael; López, Vı́ctor; Morillo, Julian; Cavazzoni, Carlo; Affinito, Fabio; Giménez, Judit; Labarta, Jesús

doi:10.1109/icppw.2017.44

Cited by 3 publications

(9 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wagner et al 63 cache hit rate is already high. However, for MF-SGD, using both AVX-512 and MCDRAM boosts performance significantly, as AVX-512 instructions produce many memory accesses in each cycle which benefit from high bandwidth of MCDRAM.…”

Section: High-performance Computingmentioning

confidence: 99%

“…Table 5 shows the parallel-programming languages used by different works. In OpenMP, both static 52,54,63,79 and dynamic 54,63,79 scheduling schemes have been used. Also, some works use OpenMP task pragmas for parallelization.…”

Section: Hou Et Al 88 Present a Technique For Automatically Generatinmentioning

confidence: 99%

“…Also, some works use OpenMP task pragmas for parallelization. 33,52,63 TA B L E 5 Parallelization language/library used on Phi MPI 4,9,12,16,22,24,25,28,35,36,43,44,50,54,59,60,63,66,76,82,86 Others Pthreads, 11,23,76,84,91,95 Intel TBB, 8,48 Cilk Plus, 48 OpenCL 49,100 Chatzikonstantis et al 28 study inferior-olivary nucleus (InfOli) simulation which is used in brain modeling. They accelerate the simulation using (i) MPI, (ii) OpenMP, and (iii) hybrid MPI+OpenMP.…”

Section: Hou Et Al 88 Present a Technique For Automatically Generatinmentioning

confidence: 99%

“…,[12][13][14][16][17][18]20,21,24,[27][28][29][30][33][34][35][36][37]39,[42][43][44][45][46][47][48]50,52,55,57,59,63,66,84,86,90,92,93,95,99 IntelMKL 2,17,19,31,32,40,93,99 …”

mentioning

confidence: 99%

See 3 more Smart Citations

A survey on evaluating and optimizing performance of Intel Xeon Phi

Mittal

2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary Intel's Xeon Phi combines the parallel processing power of a many‐core accelerator with the programming ease of CPUs. In this paper, we present a survey of works that study the architecture of Phi and use it as an accelerator for a broad range of applications. We review performance optimization strategies as well as the factors that bottleneck the performance of Phi. We also review works that perform comparison or collaborative execution of Phi with CPUs and GPUs. This paper will be useful for researchers and developers in the area of computer‐architecture and high‐performance computing.

show abstract

Section: High-performance Computingmentioning

confidence: 99%

Section: Hou Et Al 88 Present a Technique For Automatically Generatinmentioning

confidence: 99%

Section: Hou Et Al 88 Present a Technique For Automatically Generatinmentioning

confidence: 99%

“…,[12][13][14][16][17][18]20,21,24,[27][28][29][30][33][34][35][36][37]39,[42][43][44][45][46][47][48]50,52,55,57,59,63,66,84,86,90,92,93,95,99 IntelMKL 2,17,19,31,32,40,93,99 …”

mentioning

confidence: 99%

See 2 more Smart Citations

A survey on evaluating and optimizing performance of Intel Xeon Phi

Mittal

2020

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…It consists of various infrastructure nodes connecte high-performance interconnect, including computing nodes, storage systems, nodes, management nodes, datemover (DM) nodes, and web servers. The computing nodes that perform parallel processes at high speeds consis 8305 Intel many-core processor Knights Landing Nodes (KNLs) [10,11] and a 132 server processor Skylake Nodes (SKLs) [12]. Each KNL node has 68 cores per sock 96 GB (16 GB × 6) memory, while each SKL node has two sockets, each of which cores and 192 GB (16 GB × 12) memory.…”

Section: Hardware Configurationmentioning

confidence: 99%

Improvements to Supercomputing Service Availability Based on Data Analysis

Lee

Kwon

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

As the demand for high-performance computing (HPC) resources has increased in the field of computational science, an inevitable consideration is service availability in large cluster systems such as supercomputers. In particular, the factor that most affects availability in supercomputing services is the job scheduler utilized for allocating resources. Consequent to submitting user data through the job scheduler for data analysis, 25.6% of jobs failed because of program errors, scheduler errors, or I/O errors. Based on this analysis, we propose a K-hook method for scheduling to increase the success rate of job submissions and improve the availability of supercomputing services. By applying this method, the job-submission success rate was improved by 15% without negatively affecting users’ waiting time. We also achieved a mean time between interrupts (MTBI) of 24.3 days and maintained average system availability at 97%. As this research was verified on the Nurion supercomputer in a real service environment, the value of the research is expected to be found in significant service improvements.

show abstract

Quantum ESPRESSO: One Further Step toward the Exascale

Carnimeo,

Affinito,

Baroni

et al. 2023

J. Chem. Theory Comput.

Self Cite

View full text Add to dashboard Cite

We review the status of the Quantum ESPRESSO software suite for electronic-structure calculations based on plane waves, pseudopotentials, and density-functional theory. We highlight the recent developments in the porting to GPUs of the main codes, using an approach based on OpenACC and CUDA Fortran offloading. We describe, in particular, the results achieved on linear-response codes, which are one of the distinctive features of the Quantum ESPRESSO suite. We also present extensive performance benchmarks on different GPU-accelerated architectures for the main codes of the suite.

show abstract

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Cited by 3 publications

References 6 publications

A survey on evaluating and optimizing performance of Intel Xeon Phi

A survey on evaluating and optimizing performance of Intel Xeon Phi

Improvements to Supercomputing Service Availability Based on Data Analysis

Quantum ESPRESSO: One Further Step toward the Exascale

Contact Info

Product

Resources

About