Proceedings of the International Workshop on OpenCL 2018
DOI: 10.1145/3204919.3204924
|View full text |Cite
|
Sign up to set email alerts
|

CLBlast

Abstract: This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, astrophysics, computational fluid dynamics, quantum chemistry). CLBlast has five main advantages over other OpenCL BLAS libraries: 1) it is optimized for… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(17 citation statements)
references
References 18 publications
0
11
0
Order By: Relevance
“…This results in the box-plots found in Figure 6. The mean for 7zip is 48 precision of the energy measurement setup is sufficiently high for the needs of our investigation.…”
Section: Rq1: Measurementmentioning
confidence: 98%
See 1 more Smart Citation
“…This results in the box-plots found in Figure 6. The mean for 7zip is 48 precision of the energy measurement setup is sufficiently high for the needs of our investigation.…”
Section: Rq1: Measurementmentioning
confidence: 98%
“…Typically, this takes the form of tuning parameters. CLBlast [48] is an example of a library which incorporates an auto-tuning component to optimise its OpenCL BLAS library to the target hardware. Due to this tuning, CLBlast typically outperforms its direct competitor clBLAS, up to a factor of two in some cases.…”
Section: Related Workmentioning
confidence: 99%
“… conv Coulomb 3D Direct coulomb summation on 3D lattice, introduced in [5] . coulomb GEMM Matrix-matrix multipication adopted from Nugteren and Codreanu [4] , tuning space gemm-reduced reduced as in [6] . Transpose Out-of-place matrix transposition, adopted from NVIDIA mtran CUDA SDK 10.0.…”
Section: Data Descriptionmentioning
confidence: 99%
“…This linear algebra computation in Caffe is done using the GEMM function which is highly-tuned in BLAS libraries. Several studies have examined these optimizations as for example the study on OpenCL GEMM version on FPGA [7].…”
Section: A Caffe's Convolution With Gemmmentioning
confidence: 99%