GPU Scripting and Code Generation with PyCUDA

Klöckner, Andreas; Pinto, Nicolas; Catanzaro, Bryan; Lee, Yunsup; Ivanov, Paul; Fasih, Ahmed

doi:10.1016/b978-0-12-385963-1.00027-7

Cited by 16 publications

(16 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our Matlab implementation that is however not optimized for speed and logs large quantities of intermediate results, takes about thrice as long. A Python implementation using PyCUDA (Klöckner et al 2009) for GPU enabled computation of the discrete Fourier transform (see Eqs. (A.2) and (A.3)) achieves a runtime of less than 10 min on a low-cost NVIDIA(R) GeForce(TM) GT 430.…”

Section: Results On Simulated Datamentioning

confidence: 99%

Online multi-frame blind deconvolution with super-resolution and saturation correction

Hirsch

Harmeling

Sra

et al. 2011

A&A

View full text Add to dashboard Cite

Astronomical images taken by ground-based telescopes suffer degradation due to atmospheric turbulence. This degradation can be tackled by costly hardware-based approaches such as adaptive optics, or by sophisticated software-based methods such as lucky imaging, speckle imaging, or multi-frame deconvolution. Software-based methods process a sequence of images to reconstruct a deblurred high-quality image. However, existing approaches are limited in one or several aspects: (i) they process all images in batch mode, which for thousands of images is prohibitive; (ii) they do not reconstruct a super-resolved image, even though an image sequence often contains enough information; (iii) they are unable to deal with saturated pixels; and (iv) they are usually non-blind, i.e., they assume the blur kernels to be known. In this paper we present a new method for multi-frame deconvolution called online blind deconvolution (OBD) that overcomes all these limitations simultaneously. Encouraging results on simulated and real astronomical images demonstrate that OBD yields deblurred images of comparable and often better quality than existing approaches.

show abstract

Section: Results On Simulated Datamentioning

confidence: 99%

Online multi-frame blind deconvolution with super-resolution and saturation correction

Hirsch

Harmeling

Sra

et al. 2011

A&A

View full text Add to dashboard Cite

show abstract

“…Front-end programming models Many systems provide GPU support in a high-level language: C++ [45], Java [99,8,81,24], Matlab [7,80], Python [25,64]. While some go beyond simple GPU API bindings, and provide support for compiling the high-level language to GPU code, none have Dandelion's cluster-scale support; unlike Dandelion, all expose the underlying device abstraction.…”

Section: Related Workmentioning

confidence: 99%

Dandelion

Rossbach

Currey

et al. 2013

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

View full text Add to dashboard Cite

Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard highlevel languages and development tools.Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of .NET code on GPUs, Dandelion cross-compiles .NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the design and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads.

show abstract

“…The earliest attempts have been to create wrappers around CUDA and OpenCL API that still require the programmer to write the kernel code by hand and exposing a few vendor specific libraries. Such attempts include PyCUDA [11] and PyOpenCL [12]. The current version of MATLAB's proprietary parallel computing toolbox also falls in this category at the time of writing.…”

Section: Related Workmentioning

confidence: 99%

Velociraptor

Garg

Hendren

2014

Proceedings of the 23rd International Conference on Parallel Architectures and Compilation

View full text Add to dashboard Cite

Developing just-in-time (JIT) compilers that that allow scientific programmers to efficiently target both CPUs and GPUs is of increasing interest. However building such compilers requires considerable effort. We present a reusable and embeddable compiler toolkit called Velociraptor that can be used to easily build compilers for numerical programs targeting multicores and GPUs.Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for highlevel parallel and GPU constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU code and also provides a smart runtime system to manage the GPU.To demonstrate Velociraptor in action, we present two proof-of-concept case studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT compiler for Python targeting CPUs and GPUs.

show abstract

GPU Scripting and Code Generation with PyCUDA

Cited by 16 publications

References 3 publications

Online multi-frame blind deconvolution with super-resolution and saturation correction

Online multi-frame blind deconvolution with super-resolution and saturation correction

Dandelion

Velociraptor

Contact Info

Product

Resources

About