2011
DOI: 10.1016/j.jcp.2011.02.023
|View full text |Cite
|
Sign up to set email alerts
|

SU (2) lattice gauge theory simulations on Fermi GPUs

Abstract: In this work we explore the performance of CUDA in quenched lattice SU(2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the G… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
21
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
7
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 25 publications
(22 citation statements)
references
References 14 publications
1
21
0
Order By: Relevance
“…We present our results in lattice spacing units of a, with a = 0.07261(85) fm or a −1 = 2718 ± 32 MeV. We generate our configurations in NVIDIA GPUs of the FERMI series (480, 580 and Tesla 2070) with a SU(3) CUDA code upgraded from our SU(2) combination of Cabibbo-Marinari pseudoheatbath and over-relaxation algorithm [28,29]. Our SU(3) updates involve three SU(2) subgroups, we work with 9 complex numbers, and we reunitarize the matrix.…”
Section: Lattice Qcd Results Of the Pentaqumentioning
confidence: 99%
“…We present our results in lattice spacing units of a, with a = 0.07261(85) fm or a −1 = 2718 ± 32 MeV. We generate our configurations in NVIDIA GPUs of the FERMI series (480, 580 and Tesla 2070) with a SU(3) CUDA code upgraded from our SU(2) combination of Cabibbo-Marinari pseudoheatbath and over-relaxation algorithm [28,29]. Our SU(3) updates involve three SU(2) subgroups, we work with 9 complex numbers, and we reunitarize the matrix.…”
Section: Lattice Qcd Results Of the Pentaqumentioning
confidence: 99%
“…To store the lattice array in global memory, we use a SOA type array as described in [18]. The main reason to do this is due to the FFT implementation algorithm, Algo.…”
Section: Gpu Implementationmentioning
confidence: 99%
“…Lattice QCD simulations is a typical and well known HPC grand challenge, where physics results are strongly limited by available computational resources [3,4]; over the years, several generations of parallel machines, optimized for LQCD, have been developed [5,6], while the development of LQCD codes running on many core architectures, in particular GPUs, has seen large efforts in the last decade [7][8][9]. Our target is to have a single code able to run on several processors without any major code change while looking for an acceptable trade-off between portability and efficiency [10].…”
Section: Introductionmentioning
confidence: 99%