2018
DOI: 10.1051/epjconf/201817502009
|View full text |Cite
|
Sign up to set email alerts
|

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Abstract: Abstract. With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 5 publications
0
5
0
Order By: Relevance
“…[32] and [33]. The performance of the MILC code, on various architectures, is enhanced by using QOP [34], QPhiX [35][36][37][38], or QUDA [39][40][41][42].…”
Section: Milc Collaborationmentioning
confidence: 99%
“…[32] and [33]. The performance of the MILC code, on various architectures, is enhanced by using QOP [34], QPhiX [35][36][37][38], or QUDA [39][40][41][42].…”
Section: Milc Collaborationmentioning
confidence: 99%
“…Overall, this takes 2N c • 4 • 80 flops. 30 For 36 of the 40 stored directions this matrix is not unitary; compare the caption of Tab. 13.…”
Section: E3 Brillouin Laplace Operatormentioning
confidence: 99%
“…This brief exposition of the subject cannot do justice to the effort spent by other authors to maximize performance on a specific architecture for a given Dirac operator D. Recent review talks on the interplay between algorithms and machines in lattice QCD include [19][20][21][22][23]. In addition, there is a number of HPC projects in lattice QCD with similar objectives on several architectures [24][25][26][27][28][29][30][31][32][33][34][35]. Preliminary accounts 6 of this work were given in [36,37].…”
Section: Introductionmentioning
confidence: 99%
“…From a HPC viewpoint, a clear advantage of this operator with precomputed V µ is that its stencil is restricted to sites which are at most one hop away. Still, it is not trivial to reach an acceptable performance on a many-core architecture [4,5].…”
Section: Staggered Kernel Details and Performancementioning
confidence: 99%