Paul Sathre scite author profile

Over the past decade, accelerator-based supercomputers have grown from 0% to 42% performance share on the TOP500. Ideally, GPUaccelerated code on such systems should be "write once, run anywhere, " regardless of the GPU device (or for that matter, any parallel device, e.g., CPU or FPGA). In practice, however, portability can be significantly more limited due to the sheer volume of code implemented in non-portable languages. For example, the tremendous success of CUDA, as evidenced by the vast cornucopia of CUDAaccelerated applications, makes it infeasible to manually rewrite all these applications to achieve portability. Consequently, we achieve portability by using our automated CUDA-to-OpenCL source-tosource translator called CU2CL. To demonstrate the state of the practice, we use CU2CL to automatically translate three medium-tolarge, CUDA-optimized codes to OpenCL, thus enabling the codes to run on other GPU-accelerated systems (as well as CPU-or FPGAbased systems). These automatically translated codes deliver performance portability, including as much as threefold performance improvement, on a GPU device not supported by CUDA.

show abstract

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

Sathre

Gardner

Feng

2012

View full text Add to dashboard Cite

Abstract-The use of accelerators in high-performance computing is increasing. The most commonly used accelerator is the graphics processing unit (GPU) because of its low cost and massively parallel performance. The two most common programming environments for GPU accelerators are CUDA and OpenCL. While CUDA runs natively only on NVIDIA GPUs, OpenCL is an open standard that can run on a variety of hardware processing platforms, including NVIDIA GPUs, AMD GPUs, and Intel or AMD CPUs.Given the abundance of GPU applications written in CUDA, we seek to leverage this investment in CUDA and enable CUDA programs to "run anywhere" via a CUDA-to-OpenCL sourceto-source translator. The resultant OpenCL versions permit the GPU-accelerated codes to run on a wider variety of processors that would not otherwise be possible. However, robust sourceto-source translation from CUDA to OpenCL faces a myriad of challenges. As such, this paper identifies those challenges and presents a classification of CUDA language idioms that present practical impediments to automatic translation.

show abstract

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Helal¹,

Tech²,

Sathre³

et al. 2016

View full text Add to dashboard Cite

Abstract-To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within a compute node and across compute nodes. As such, we present MetaMorph, a library framework designed to (automatically) extract as much computational capability as possible from HPC systems. Its design centers around three core principles: abstraction, interoperability, and adaptivity. To demonstrate its efficacy, we present a case study that uses the structured grids design pattern, which is heavily used in computational fluid dynamics. We show how MetaMorph significantly reduces the development time, while delivering performance and interoperability across an array of heterogeneous devices, including multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs.

show abstract

A Framework for Auto-Parallelization and Code Generation

Krommydas

Sathre

Sasanka

et al. 2018

View full text Add to dashboard Cite

A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation

Sathre

Helal

Feng

2018

View full text Add to dashboard Cite

Hybrid Deterministic/Monte Carlo Neutronics Using GPU Accelerators

Willert

Kelley

Knoll

et al. 2012

View full text Add to dashboard Cite

Accelerating Bio-Inspired MAV Computations using GPUs

Amritkar

Tafti

Sathre

et al. 2014

View full text Add to dashboard Cite

In the paper, we discuss the use of general purpose GPUs for simulating the flapping flight of a fruit bat. The highly deformable prescribed wing motion is simulated using an Immersed Boundary Method (IBM). These computations are optimized for the GPU platform using CUDA Fortran to obtain about six times speed up over CPUs. Strong scaling study and code profiling is performed to understand code characteristics to develop a path for future improvements.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paul Sathre

Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

A Framework for Auto-Parallelization and Code Generation

A Composable Workflow for Productive Heterogeneous Computing on FPGAs via Whole-Program Analysis and Transformation

Hybrid Deterministic/Monte Carlo Neutronics Using GPU Accelerators

Accelerating Bio-Inspired MAV Computations using GPUs

Contact Info

Product

Resources

About