Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2021
DOI: 10.1145/3458817.3476158
|View full text |Cite
|
Sign up to set email alerts
|

Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 39 publications
0
3
0
Order By: Relevance
“…Note that we also observe a similar execution time distribution on other hardware platforms used in this paper. Moreover, independent studies have shown that SYMGS and SPMV operations are memory-bounded in HPCG, each has a low computation-tomemory ratio of 0.152 and 0.156 flops/byte, respectively [63]. Such a low arithmetic intensity further highlights the need of memoryaware optimization for MG.…”
Section: Overhead Of Symgsmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that we also observe a similar execution time distribution on other hardware platforms used in this paper. Moreover, independent studies have shown that SYMGS and SPMV operations are memory-bounded in HPCG, each has a low computation-tomemory ratio of 0.152 and 0.156 flops/byte, respectively [63]. Such a low arithmetic intensity further highlights the need of memoryaware optimization for MG.…”
Section: Overhead Of Symgsmentioning
confidence: 99%
“…However, little work has attempted to optimize the memory access latency of SYMGS on multi-core CPUs. SYMGS is known to be memory-bounded because the algorithm needs to access large, sparse matrices that cannot fit into the last level cache and the kernel computation has a low arithmetic intensity [63]. Reducing the memory access latency is essential for gaining further performance improvement for SYMGS.…”
Section: Introductionmentioning
confidence: 99%
“…It turns out such prohibition limits the performance drastically. It is reported that implementing the HPCG computation in a matrix-free form significantly improves the performance, by 4.67× on the New Sunway supercomputer [Zhu et al 2021]. When possible, HPC researchers still seek matrix-free approaches even for implicit approaches, e.g., the 2016 Gordon Bell Prize winner [Yang et al 2016] designed and manually implemented a geometry-based pipelined ILU method that maps the data dependency to hardwaresupported inter-core communication, which is a case for further optimizing with the sparsity pattern in hand.…”
Section: Solving Differential Equations On Structured Gridsmentioning
confidence: 99%
“…Efforts have been made to optimize SpMV by optimizing the sparse matrix storage format [5], [9]- [11] and the computation kernel [12], [13]. These prior optimizations have primarily focused on the computation of a single, isolated SpMV invocation.…”
Section: Introductionmentioning
confidence: 99%