Trevor L. McDonell scite author profile

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge.To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of collective array operations. We embed this purely functional array language in Haskell with an online code generator for NVIDIA's CUDA GPGPU programming environment. We regard the embedded language's collective array operations as algorithmic skeletons; our code generator instantiates CUDA implementations of those skeletons to execute embedded array programs.This paper outlines our embedding in Haskell, details the design and implementation of the dynamic code generator, and reports on initial benchmark results. These results suggest that we can compete with moderately optimised native CUDA code, while enabling much simpler source programs.

show abstract

Optimising purely functional GPU programs

McDonell

Chakravarty

Keller

et al. 2013

SIGPLAN Not.

View full text Add to dashboard Cite

Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance.In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU. We present novel methods for implementing sharing recovery and array fusion, and demonstrate their effectiveness on a set of benchmarks.

show abstract

Optimising purely functional GPU programs

McDonell

Chakravarty

Keller

et al. 2013

View full text Add to dashboard Cite

Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance. In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU. We present novel methods for implementing sharing recovery and array fusion, and demonstrate their effectiveness on a set of benchmarks.

show abstract

Molecular dissociation of group-V hydrides onSi(001)

et al. 2005

View full text Add to dashboard Cite

We present a comparative ab initio survey of possible dissociation products of NH 3 , PH 3 , and AsH 3 on the Si͑001͒ surface. In agreement with previous studies, we find that the relative energetics of XH 3 and XH 2 species ͑X =N, P, As͒ are common across all three systems. In contrast, the energetics of the onward dissociation into XH and X species differs markedly between nitrogen on the one hand, and phosphorus and arsenic on the other.

show abstract

NH3 on Si(001): Can Gaussian cluster and planewave slab models agree on energetics?

2007

View full text Add to dashboard Cite

Type-safe runtime code generation: accelerate to LLVM

et al. 2015

View full text Add to dashboard Cite

Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and the implementation of the embedded language compiler itself. In this paper, we focus on the safety guarantees achieved by type preserving compilation. We discuss the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where we are able to preserve types from the source language down to a low-level register language in SSA form. Specifically, we demonstrate the practicability of our approach by creating a new type-safe interface to the industrial-strength LLVM compiler infrastructure, which we used to build two new Accelerate backends that show competitive runtimes on a set of benchmarks across both CPUs and GPUs.

show abstract

Embedding Foreign Code

Clifton-Everest

McDonell

Chakravarty

et al. 2014

View full text Add to dashboard Cite

Streaming irregular arrays

Clifton-Everest

McDonell

Chakravarty

et al. 2017

View full text Add to dashboard Cite

Flat data parallel array languages suffer from poor modularity. Despite being established as a high-level and expressive means of programming parallel architectures, the fact they do not support nested arrays, and that parallel functions cannot be called from within parallel contexts limits their usefulness to only a select few domains. Nested data parallel languages solve this problem, but they also assume irregularity of all nested structures. This places a cost on nesting, a cost that is needlessly paid for a certain class of programs, those where nesting is strictly regular.A second limitation of such languages is that arrays, by definition, allow for random access. This means that they must be loaded into memory in their entirety. If memory is limited, as is the case with GPUs, array languages offer no high-level means of expressing programs containing structures too large for that memory but do not require random access. This dissertation describes an extension to the Accelerate language: irregular array sequences. They allow for a limited, but still useful, form of nesting as well a streaming execution model that can work under limited memory. Furthermore, in realising irregular array sequences, we describe a generalised program flattening (vectorisation) transform that does not introduce the unnecessary cost on nesting that is incurred for strictly regular (sub)programs. This transform applies to a much broader domain than just array sequences.As a further complement to this work, this dissertation also describes two other extensions to Accelerate, a foreign function interface (FFI) and GPU-aware garbage collection. The former is the first instance of an embedded domain specific language having an FFI, and both are of considerable practical value to programmers using Accelerate framework.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.