Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 2013
DOI: 10.1145/2503210.2503219
|View full text |Cite
|
Sign up to set email alerts
|

Augem

Abstract: Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers. In particular, based on domain-specific knowledge about algorithms of the DLA kernels, we use a collection of parameteriz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 137 publications
(19 citation statements)
references
References 21 publications
0
19
0
Order By: Relevance
“…Our Fortran 90 code is linked with OpenBLAS [102,103] and Expokit. [104] Random numbers were generated with the 48 bit Linear Congruential Generator with Prime Addend, as implemented in SPRNG5.…”
Section: Computational Detailsmentioning
confidence: 99%
“…Our Fortran 90 code is linked with OpenBLAS [102,103] and Expokit. [104] Random numbers were generated with the 48 bit Linear Congruential Generator with Prime Addend, as implemented in SPRNG5.…”
Section: Computational Detailsmentioning
confidence: 99%
“…It has been argued that empirical search is the only way to obtain highly optimized implementations for DLA operations [Demmel et al 2005;Bilmes et al 1997a;Whaley and Dongarra 1998], and an increasing number of recent projects (Build-To-Order BLAS [Belter et al 2010] and AuGEM [Wang et al 2013]) now adopt empirical search to identify optimal parameter values for DLA algorithms. The problem with empirical-based approaches is that they unleash a walloping search space, due to the combination of a large number of possible values for a substantial set of parameters.…”
Section: Approaches To Identify the Optimal Parameter Values For Gemmmentioning
confidence: 99%
“…We note with interest that OpenBLAS utilized an 8 × 2 micro-kernel for the AMD Kaveri. While it differs from our analytical 4 × 6 micro-kernel, we note that AuGEM [Wang et al 2013], an empirical search tool developed by the authors of OpenBLAS in order to automatically generate the micro-kernel, generated a micro-kernel that operates on a 6 × 4 micro-tile. This suggests that the original 8 × 2 micro-kernel currently used by OpenBLAS may not be optimal for the AMD architecture.…”
Section: Evaluating the Model For Mr And Nrmentioning
confidence: 99%
“…Even though the practical value of ω is larger than the asymptotic one, matrix multiplication routines have been the subject of intense implementation development for decades, and highly-tuned software is readily available for a variety of architectures [Dumas, Giorgi, andPernet, 2008, Wang, Zhang, Zhang, andYi, 2013]. Running times based on n ω have practical as well as theoretical significance; the ω indicates where an algorithm is able to take advantage of fast low-level matrix multiplication routines.…”
Section: Related Workmentioning
confidence: 99%