2020
DOI: 10.48550/arxiv.2003.06324
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Abstract: Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware.Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is incredibly challenging: for each provided algorithm, high-performance implementations have to be developed for all common… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…Halide [31] achieves this balance by separating operator specifications (what is computed) from schedules (how, when, and where each output element is generated). This style of separation has proven highly effective across both application domains and hardware targets; numerous compilers including TVM [8], FireIron [14], LIFT [35], and Accelerate [5] follow variations of this strategy.…”
Section: Tensor Irs and Compilersmentioning
confidence: 99%
“…Halide [31] achieves this balance by separating operator specifications (what is computed) from schedules (how, when, and where each output element is generated). This style of separation has proven highly effective across both application domains and hardware targets; numerous compilers including TVM [8], FireIron [14], LIFT [35], and Accelerate [5] follow variations of this strategy.…”
Section: Tensor Irs and Compilersmentioning
confidence: 99%