2017 Fifth International Symposium on Computing and Networking (CANDAR) 2017
DOI: 10.1109/candar.2017.66
|View full text |Cite
|
Sign up to set email alerts
|

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

Abstract: We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication and iterative solver… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2
2
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…An approach keeping the array of structure data layout and inserting pragmas [9] gives 225 GFlops (245 GFlops after correcting the difference in clock cycle). In our previous report [3], which corresponds to the layout 2 without redundant boundary data packing/copy, the best performance on single node was 340 GFlops (4 MPI proc./node). With the same condition, it becomes 369 GFlops whose improvement is mainly due to the refinement of the prefetch.…”
Section: Data Layoutmentioning
confidence: 96%
See 1 more Smart Citation
“…An approach keeping the array of structure data layout and inserting pragmas [9] gives 225 GFlops (245 GFlops after correcting the difference in clock cycle). In our previous report [3], which corresponds to the layout 2 without redundant boundary data packing/copy, the best performance on single node was 340 GFlops (4 MPI proc./node). With the same condition, it becomes 369 GFlops whose improvement is mainly due to the refinement of the prefetch.…”
Section: Data Layoutmentioning
confidence: 96%
“…As a testbed of our analysis, we choose two types of fermion matrices together with an iterative linear equation solver. In our previous report [2,3], we developed a code along the above policy and applied it to KNL. In this paper, in addition to improved performance, we rearrange these prescriptions so that each effect is more apparent.…”
Section: Introductionmentioning
confidence: 99%
“…Data layouts SoA, 20,22,23,28,30,36,50,79,82,90 AoS, 9 AoSoA 57,90 Data alignment 6,9,14,18,20,24,44,45,52,53,66,79,84,90 Padding 4,7,9,20,24,44,52,53,79,82,91 Dependency disambiguation 15,28,36,82,91 Prefetching Software, 4,7,9,14,17,22,23,40,41,50,…”
Section: Ta B L E 3 Optimization Strategiesmentioning
confidence: 99%
“…Kanamori et al 66 accelerate "lattice quantum chromodynamics" (QCD) code on KNL. For the complex vector data, the real and imaginary parts are placed consecutively in the memory.…”
Section: Prefetchingmentioning
confidence: 99%
See 1 more Smart Citation