Previous studies into performance portability have typically analysed a single application (and its various implementations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of applications from across the space of dwarfs, written in multiple parallel programming models, and across a diverse set of architectures. We apply rigorous performance portability metrics, as defined by Pennycook et al [1]. We believe this is the broadest and most rigorous performance portability study to date, representing a far reaching exploration of the state of performance portability that is achievable today. We will present a summary of the performance portability of each application and programming model across our diverge range of twelve computer architectures, including six different server CPUs from five different vendors, five different GPUs from two different vendors, and one vector architecture. We will conclude with an analysis of the performance portability of key programming models in general, across different application spaces as well across differing architectures, allowing us to comment on more general performance portability principles.
Abstract. Many scientific codes consist of memory bandwidth bound kernels -the dominating factor of the runtime is the speed at which data can be loaded from memory into the Arithmetic Logic Units, before results are written back to memory. One major advantage of many-core devices such as General Purpose Graphics Processing Units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. However, as with CPUs, this peak memory bandwidth is usually unachievable in practice and so benchmarks are required to measure a practical upper bound on expected performance.The choice of one programming model over another should ideally not limit the performance that can be achieved on a device. GPU-STREAM has been updated to incorporate a wide variety of the latest parallel programming models, all implementing the same parallel scheme. As such this tool can be used as a kind of Rosetta Stone which provides both a cross-platform and cross-programming model array of results of achievable memory bandwidth.
Summary In this paper, we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray XC50 “Scout” system, combining Cavium ThunderX2 Arm‐based CPUs with Cray's Aries interconnect. The full Isambard system will be delivered in the summer of 2018, when it will contain over 10 000 Arm cores. In this work, we present node‐level performance results from eight early‐access nodes that were upgraded to B0 beta silicon in March 2018. We present node‐level benchmark results comparing ThunderX2 with mainstream CPUs, including Intel Skylake and Broadwell, as well as Xeon Phi. We focus on a range of applications and mini‐apps important to the UK national HPC service, ARCHER, as well as to the Isambard project partners and the wider HPC community. We also compare performance across three major software toolchains available for Arm: Cray's CCE, Arm's version of Clang/Flang/LLVM, and GNU.
With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and accelerator technologies from different vendors. As such, applications will be required to be portable between these different architectures, but it is also critical that they are efficient too. These double requirements for portability and efficiency begets the need for performance portability. In this study we survey the performance portability of different programming models, including the open standards OpenMP and SYCL, across the diverse landscape of Exascale and pre-Exascale processors from Intel, AMD, NVIDIA, Fujitsu, Marvell, and Amazon, together encompassing GPUs and CPUs based on both x86 and Arm architectures. We also take a historical view and analyse how performance portability has changed over the last year.
The ability to detect radiation in microfluidic devices is important for the on-chip analysis of radiopharmaceuticals, but previously reported systems have largely suffered from various limitations including cost, complexity of fabrication, and insufficient sensitivity and/or speed. Here, we present the use of sensitive, low cost, small-sized, commercially available silicon photomultipliers (SiPMs) for the detection of radioactivity inside microfluidic channels fabricated from a range of conventional microfluidic chip substrates. We demonstrate the effects of chip material and thickness on the detection of the positron-emitting isotope, [(18)F]fluoride, and find that, while the SiPMs are light sensors, they are able to detect radiation even through opaque chip materials via direct positron and gamma (γ) ray interaction. Finally, we employed the SiPM platform for analysis of the PET (positron emission tomography) radiotracers 2-[(18)F]fluoro-2-deoxy-d-glucose ([(18)F]FDG) and [(68)Ga]gallium-citrate, and highlight the ability to detect the γ ray emitting SPECT (single photon emission computed tomography) radiotracer, [(99m)Tc]pertechnetate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.