2018
DOI: 10.1007/978-3-319-95168-3_31
|View full text |Cite
|
Sign up to set email alerts
|

Practical Implementation of Lattice QCD Simulation on SIMD Machines with Intel AVX-512

Abstract: We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of AVX-512, such as using intrinsics and manual prefetching, for the matrix multiplication. Based on experience on t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…] which is written in C++ based on the object-oriented design. Bridge++ has been used to investigate a recipe of tuning on Intel AVX-512 architectures [21,22].…”
Section: Multi-grid Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…] which is written in C++ based on the object-oriented design. Bridge++ has been used to investigate a recipe of tuning on Intel AVX-512 architectures [21,22].…”
Section: Multi-grid Algorithmmentioning
confidence: 99%
“…For the smaller local volume case, the domain-decomposed operator is faster than the full operator as expected from the absence of communications. The tuning described in [22] for full operator works quite efficiently for a larger local volume so that its performance exceeds Table 1. Elapsed time for the multi-grid solver.…”
Section: Performance On Intel Xeon Phi Clustermentioning
confidence: 99%
See 1 more Smart Citation
“…This requires that the lattice size in x-direction must be a multiple of 8. The details of the tuning with the AVX-512 instruction set were presented in [7]. For Armv8.2-A-SVE, we adopt a different packing: as depicted in the right panel of Fig.…”
Section: Simd Architectures: Intel Avx-512 and Fujitsu A64fxmentioning
confidence: 99%
“…Recent supercomputers, however, adopt a variety of architecture: multi-core parallel machines with wide SIMD (A64FX and Intel processors), and clusters with accelerator devices such as GPUs, PEZY-SC, and vector processors (NEC SX-Aurora). Soon after the first public release of Bridge++ in 2012 [2], we had started to investigate possible extensions of our code to exploit these new architectures while keeping the readability and portability, as well as to develop tuning techniques for them [3,4,5,6,7,8]. Recently we have constructed a framework to incorporate the tuned codes as an alternative part to the previously developed Bridge++ code, and decided to release it as version 2.0.…”
Section: Introductionmentioning
confidence: 99%