2021
DOI: 10.1007/978-3-030-86976-2_15
|View full text |Cite
|
Sign up to set email alerts
|

Object-Oriented Implementation of Algebraic Multi-grid Solver for Lattice QCD on SIMD Architectures and GPU Clusters

Abstract: A portable implementation of elaborated algorithm is important to use variety of architectures in HPC applications. In this work we implement and benchmark an algebraic multi-grid solver for Lattice QCD on three different architectures, Intel Xeon Phi, Fujitsu A64FX, and NVIDIA Tesla V100, in keeping high performance and portability of the code based on the object-oriented paradigm. Some parts of code are specific to an architecture employing appropriate data layout and tuned matrix-vector multiplication kerne… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 22 publications
0
9
0
Order By: Relevance
“…We use single precision for the multigrid preconditioner. The general structure of our implementation which does not use QWS is found in [6].…”
Section: Implementation Detailsmentioning
confidence: 99%
See 2 more Smart Citations
“…We use single precision for the multigrid preconditioner. The general structure of our implementation which does not use QWS is found in [6].…”
Section: Implementation Detailsmentioning
confidence: 99%
“…In the figure, the setup time, which is necessary to prepare the null space vectors and called only once for each configuration, is also piled as a light color box to each elapsed time of the multigrid solver. LDDHMC is always the fastest to solve one equation Although the efficiency is lower, a SAP preconditioner without QWS is also available in Bridge++ and can be combined with the multigrid solver [6]. because of the large overhead of the multigrid solver due to the setup process to generate null space vectors.…”
Section: Performancementioning
confidence: 99%
See 1 more Smart Citation
“…Recent supercomputers, however, adopt a variety of architecture: multi-core parallel machines with wide SIMD (A64FX and Intel processors), and clusters with accelerator devices such as GPUs, PEZY-SC, and vector processors (NEC SX-Aurora). Soon after the first public release of Bridge++ in 2012 [2], we had started to investigate possible extensions of our code to exploit these new architectures while keeping the readability and portability, as well as to develop tuning techniques for them [3,4,5,6,7,8]. Recently we have constructed a framework to incorporate the tuned codes as an alternative part to the previously developed Bridge++ code, and decided to release it as version 2.0.…”
Section: Introductionmentioning
confidence: 99%
“…Some details of the implementation have been reported in Ref. [8] together with that of a multi-grid solver.…”
Section: Introductionmentioning
confidence: 99%