Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems 2016
DOI: 10.1145/2968455.2968521
|View full text |Cite
|
Sign up to set email alerts
|

Matrix multiplication beyond auto-tuning

Abstract: Graphics Processing Units (GPUs) are used as general purpose parallel accelerators in a wide range of applications. They are found in most computing systems, and mobile devices are no exception. The recent availability of programming APIs such as OpenCL for mobile GPUs promises to open up new types of applications on these devices.However, producing high performance GPU code is extremely difficult. Subtle differences in device characteristics can lead to large performance variations when different optimization… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…In addition, several parallel programming frameworks exist [11,17,23,29,38,44,45,47] that enable the compilation of domain-specific languages on GPUs. Lift [26,46] extends its existing data parallel primitive types to accommodate loop tiling (e.g., slide,pad) and its low-level OpenCL with local memory (e.g., toLocal) allocation for stencil computations.…”
Section: Gpu Features Into Programming Languagesmentioning
confidence: 99%
“…In addition, several parallel programming frameworks exist [11,17,23,29,38,44,45,47] that enable the compilation of domain-specific languages on GPUs. Lift [26,46] extends its existing data parallel primitive types to accommodate loop tiling (e.g., slide,pad) and its low-level OpenCL with local memory (e.g., toLocal) allocation for stencil computations.…”
Section: Gpu Features Into Programming Languagesmentioning
confidence: 99%