2013
DOI: 10.1145/2544174.2500595
|View full text |Cite
|
Sign up to set email alerts
|

Optimising purely functional GPU programs

Abstract: Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance.In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Both… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
44
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 31 publications
(44 citation statements)
references
References 36 publications
(55 reference statements)
0
44
0
Order By: Relevance
“…There are many EDSLs that rely on a higher-order interface towards the user and a first-order representation for analysis and code generation: Lava [3], Pan [6], Nikola [8], Accelerate [10], Obsidian [12] and Feldspar [1], to name some. All of these EDSLs employ some kind of higher-order to first-order conversion.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…There are many EDSLs that rely on a higher-order interface towards the user and a first-order representation for analysis and code generation: Lava [3], Pan [6], Nikola [8], Accelerate [10], Obsidian [12] and Feldspar [1], to name some. All of these EDSLs employ some kind of higher-order to first-order conversion.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Mainland [2] recently extended Template Haskell with support for quasiquoting arbitrary programming languages, which greatly simplifies writing code generators that produce complex C, CUDA, OpenCL, or Objective-C code by writing code templates in the syntax of the generated language-for example, Accelerate, an embedded language for GPU programming, makes extensive use of that facility to generate CUDA GPU code [3].…”
Section: Extended Abstractmentioning
confidence: 99%
“…Mainland [2] recently extended Template Haskell with support for quasiquoting arbitrary programming languages, which greatly simplifies writing code generators that produce complex C, CUDA, OpenCL, or Objective-C code by writing code templates in the syntax of the generated language-for example, Accelerate, an embedded language for GPU programming, makes extensive use of that facility to generate CUDA GPU code [3].In this demo, I will show that quasiquoting also enables a new form of language interoperability. Here, a simple example using Objective-C: nslog :: String -> IO () nslog msg = $(objc ['msg :> ''String] (void [cexp| NSLog(@"A message from Haskell: %@", msg) |]))…”
mentioning
confidence: 99%
“…Accelerate [23] uses an elaboration of the delayed arrays representation from Repa, and in particular manages to avoid duplicating work. All array operations have a uniform representation as constructors for delayed arrays, on which fusion is performed by tree contraction.…”
Section: Related Workmentioning
confidence: 99%