CLBlast

Nugteren, Cedric

doi:10.1145/3204919.3204924

Cited by 52 publications

(17 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This results in the box-plots found in Figure 6. The mean for 7zip is 48 precision of the energy measurement setup is sufficiently high for the needs of our investigation.…”

Section: Rq1: Measurementmentioning

confidence: 98%

See 1 more Smart Citation

Approximate Oracles and Synergy in Software Energy Search Spaces

Bruce

Petke

Harman

et al. 2019

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Abstract-Reducing the energy consumption of software systems though optimisations techniques such as genetic improvement is gaining interest. However, efficient and effective improvement of software systems requires a better understanding of the code-change search space. One important choice practitioners have is whether to preserve the system's original output or permit approximation with each scenario having its own search space characteristics. When output preservation is a hard constraint, we report that the maximum energy reduction achievable by the modification operators is 2.69% (0.76% on average). By contrast, this figure increases dramatically to 95.60% (33.90% on average) when approximation is permitted, indicating the critical importance of approximate output quality assessment for code optimisation. We investigate synergy, a phenomenon that occurs when simultaneously applied source code modifications produce an effect greater than their individual sum. Our results reveal that 12.0% of all joint code modifications produced such a synergistic effect though 38.5% produce an antagonistic interaction in which simultaneously applied modifications are less effective than when applied individually. This highlights the need for more advanced search-based approaches.

show abstract

“…This results in the box-plots found in Figure 6. The mean for 7zip is 48 precision of the energy measurement setup is sufficiently high for the needs of our investigation.…”

Section: Rq1: Measurementmentioning

confidence: 98%

“…Typically, this takes the form of tuning parameters. CLBlast [48] is an example of a library which incorporates an auto-tuning component to optimise its OpenCL BLAS library to the target hardware. Due to this tuning, CLBlast typically outperforms its direct competitor clBLAS, up to a factor of two in some cases.…”

Section: Related Workmentioning

confidence: 99%

Approximate Oracles and Synergy in Software Energy Search Spaces

Bruce

Petke

Harman

et al. 2019

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“… conv Coulomb 3D Direct coulomb summation on 3D lattice, introduced in [5] . coulomb GEMM Matrix-matrix multipication adopted from Nugteren and Codreanu [4] , tuning space gemm-reduced reduced as in [6] . Transpose Out-of-place matrix transposition, adopted from NVIDIA mtran CUDA SDK 10.0.…”

Section: Data Descriptionmentioning

confidence: 99%

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

et al. 2021

View full text Add to dashboard Cite

“…This linear algebra computation in Caffe is done using the GEMM function which is highly-tuned in BLAS libraries. Several studies have examined these optimizations as for example the study on OpenCL GEMM version on FPGA [7].…”

Section: A Caffe's Convolution With Gemmmentioning

confidence: 99%

Acceleration of image classification with Caffe framework using FPGA

Danopoulos¹,

Kachris

Soudris³

2018

2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST)

View full text Add to dashboard Cite

Abstract-Caffe is a deep learning framework, originally developed at UC Berkeley and widely used in large-scale industrial applications such as vision, speech, and multimedia. It supports many different types of deep learning architectures such as CNNs (convolutional neural networks) geared towards image classification and image recognition. In this paper we develop a platform for the efficient deployment and acceleration of Caffe framework on embedded systems that are based on the Zynq SoC. The most computational intensive part of image classification is the processing of the convolution layers of the deep learning algorithms and more specifically the GEMM (general matrix multiplication) function calls. In the proposed framework, a hardware accelerator has been implemented, validated and optimized using Xilinx SDSoC Development Environment to perform the GEMM function. The accelerator that was developed achieves up to 98× speed-up compared with the simple ARM CPU implementation. The results showed that the mapping of Caffe on the FPGA-based Zynq takes advantage of the low-power, customizable and programmable fabric and ultimately reduces time and power consumption of image classification.

show abstract

CLBlast

Cited by 52 publications

References 18 publications

Approximate Oracles and Synergy in Software Energy Search Spaces

Approximate Oracles and Synergy in Software Energy Search Spaces

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

Acceleration of image classification with Caffe framework using FPGA

Contact Info

Product

Resources

About