2015
DOI: 10.1007/978-3-319-17248-4_4
|View full text |Cite
|
Sign up to set email alerts
|

A CUDA Implementation of the High Performance Conjugate Gradient Benchmark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(21 citation statements)
references
References 7 publications
0
21
0
Order By: Relevance
“…For the fullsystem tests, the overheads of the halo exchange and the global collective with respect to the overall HPCG runtime are only around 7.3% and 5.0%, respectively. For comparison purposes, Table 1 summarizes the HPCG results on several other systems, collected from both published results [20,30,31] and the official HPCG list of June 2017 [9]. It can be seen that although the HPCG-to-HPL ratio of the Sunway platform is relatively low because of the highly limited data-moving capability, the Flop/Byte efficiency [37], which measures the ratio of the HPCG performance to the total memory bandwidth, is comparable to other systems.…”
Section: Full-system Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…For the fullsystem tests, the overheads of the halo exchange and the global collective with respect to the overall HPCG runtime are only around 7.3% and 5.0%, respectively. For comparison purposes, Table 1 summarizes the HPCG results on several other systems, collected from both published results [20,30,31] and the official HPCG list of June 2017 [9]. It can be seen that although the HPCG-to-HPL ratio of the Sunway platform is relatively low because of the highly limited data-moving capability, the Flop/Byte efficiency [37], which measures the ratio of the HPCG performance to the total memory bandwidth, is comparable to other systems.…”
Section: Full-system Resultsmentioning
confidence: 99%
“…HPCG has drawn increasing attention from both academics and industry since its announcement in 2013. For example, a multicolor reordering technique was employed to improve the performance of HPCG on CPU-GPU heterogeneous clusters [31]. The optimization of HPCG on the K supercomputer was done in Kumahata et al [20], where a block multicoloring method was employed for the parallelization of SymGS.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…HPL is a benchmark program that determines the solution to Ax = b, which denotes a large-scale dense matrix problem of a linear equation. The performance of HPL is determined by the 64-bit floating-point operation used in multiplication of the dense matrix, which is a major calculation in the methodology of the benchmark program [5,22]. The FLOPS value, which is obtained from HPL, is used as a measure of supercomputing performance in the TOP500 Project, which presents a list of the top 500 fastest supercomputers in the world since 1993.…”
Section: Prior Studies On Supercomputer Performance Measurementmentioning
confidence: 99%
“…However, the majority of current applications compute differential equations that require high memory bandwidth and irregular data access. As a consequence, there is a low correlation between the performances of HPL and the application [22,23,24].…”
Section: Prior Studies On Supercomputer Performance Measurementmentioning
confidence: 99%