2018
DOI: 10.1051/epjconf/201817509002
|View full text |Cite
|
Sign up to set email alerts
|

Wilson and Domainwall Kernels on Oakforest-PACS

Abstract: We report the performance of Wilson and Domainwall Kernels on a new Intel Xeon Phi Knights Landing based machine named Oakforest-PACS, which is co-hosted by University of Tokyo and Tsukuba University and is currently fastest in Japan. This machine uses Intel Omni-Path for the internode network. We compare performance with several types of implementation including that makes use of the Grid library. The code is incorporated with the code set Bridge++.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2
1
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…We use lattice QCD code of Bridge++ [28,29] and its optimized version for the Oakforest-PACS by Dr. I. Kanamori [30] expected, sink smearings make the behavior of the wave function at short distances much smoother, so that the potential becomes almost single-valued even at short distances where discrete data points are sparsely located.…”
Section: Acknowledgementsmentioning
confidence: 99%
“…We use lattice QCD code of Bridge++ [28,29] and its optimized version for the Oakforest-PACS by Dr. I. Kanamori [30] expected, sink smearings make the behavior of the wave function at short distances much smoother, so that the potential becomes almost single-valued even at short distances where discrete data points are sparsely located.…”
Section: Acknowledgementsmentioning
confidence: 99%
“…Wilson and Domainwall Kernels on Oakforest-PACS: I. Kanamori and H. Matsufuru developed for Bridge++ two different implementations of a Wilson/Domainwall kernel [13]. Their strategy was to have a direct comparison of two radically different implementations: a simple one (impl-1) and a more aggressive one (impl-2).…”
Section: Epj Web Of Conferencesmentioning
confidence: 99%
“…The original fixed data layout with double precision floating point numbers is generalized to flexible data layouts in double or single precisions. Exploratory implementations and tuning have been studied for the SIMD architecture with Intel AVX-512 [5][6][7], GPU architectures with OpenACC [8,9] or OpenCL [10? ], the PEZY-SC manycore accelerator [11], and a vector architecture of NEC SX-Aurora TSUBASA [4]. In the framework of Bridge++ version 2, these machine-specific implementations are provided as alternative codes to the original implementation which takes over the specific tasks such as calculation of fermion propagators.…”
Section: Introductionmentioning
confidence: 99%