2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019
DOI: 10.1109/ipdps.2019.00019
|View full text |Cite
|
Sign up to set email alerts
|

Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?

Abstract: Among the (uncontended) common wisdom in High-Performance Computing (HPC) is the applications' need for large amount of double-precision support in hardware. Hardware manufacturers, the TOP500 list, and (rarely revisited) legacy software have without doubt followed and contributed to this view.In this paper, we challenge that wisdom, and we do so by exhaustively comparing a large number of HPC proxy applications on two processors: Intel's Knights Landing (KNL) and Knights Mill (KNM). Although similar, the KNL … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 42 publications
0
7
0
Order By: Relevance
“…The HPC community has been testing Arm-based architectures for a few years now [1], [2], [3], and Supercomputer Fugaku [4] is the first large-scale system in the top-end of the TOP500 list, which demonstrates the competitiveness of Arm in a space which recently had been dominated by Intel, AMD, and Nvidia. The benefits of Arm CPUs paired with high bandwidth memory, as in the case of Fujitsu's A64FX processor [5], for the HPC field are clear: (1) Arm CPUs are highly customizable, energy efficient, and there is an existing ecosystem of software, compilers, tools, etc., which is readily available (unlike for the K computer with its SPARC CPU); and (2) most applications executed on HPC systems tend to be memory-bandwidth-bound, as we have shown in a previous study [6]. Although, a different compute-tobandwidth ratio, as found in A64FX, might challenge this view in individual cases resulting in a greater influence by the compiler onto the performance.…”
Section: Introductionmentioning
confidence: 84%
See 1 more Smart Citation
“…The HPC community has been testing Arm-based architectures for a few years now [1], [2], [3], and Supercomputer Fugaku [4] is the first large-scale system in the top-end of the TOP500 list, which demonstrates the competitiveness of Arm in a space which recently had been dominated by Intel, AMD, and Nvidia. The benefits of Arm CPUs paired with high bandwidth memory, as in the case of Fujitsu's A64FX processor [5], for the HPC field are clear: (1) Arm CPUs are highly customizable, energy efficient, and there is an existing ecosystem of software, compilers, tools, etc., which is readily available (unlike for the K computer with its SPARC CPU); and (2) most applications executed on HPC systems tend to be memory-bandwidth-bound, as we have shown in a previous study [6]. Although, a different compute-tobandwidth ratio, as found in A64FX, might challenge this view in individual cases resulting in a greater influence by the compiler onto the performance.…”
Section: Introductionmentioning
confidence: 84%
“…ECP proxy-apps and RIKEN Fiber mini-apps are collections of so called proxy applications which are smaller representative codes and inputs for production applications commonly executed on supercomputers in the USA and Japan. We have studied these codes previously [6], [11], and we refer the reader to these publications for details.…”
Section: Benchmarks -From Micro To Macro Levelmentioning
confidence: 99%
“…Although our method is limited to inner product based computations, it extends the application range of hardware with limited (or no) FP32/FP64 resources and fast low-precision processing units for general purpose workloads. Consequently, we can consider reducing the number of FP64 (or even FP32) FPUs, as discussed by Domke et al [3], by exchanging them with low-precision FPUs such as Tensor Cores. Our rationale is supported by the following situations.…”
Section: Discussionmentioning
confidence: 99%
“…Domke et al 51 note that conventional chips allocated a large portion of silicon area to DP computing units, however, recent processors, including KNM, allocate a large portion of chip area to single/half-precision/integer units. They study the impact of this change on the performance of HPC applications, when they run on KNL, KNM, and a Broadwell CPU.…”
Section: Machine Learningmentioning
confidence: 99%