Augem

Wang, Qian; Zhang, Xianyi; Zhang, Yunquan; Yi, Qing

doi:10.1145/2503210.2503219

Cited by 137 publications

(19 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our Fortran 90 code is linked with OpenBLAS [102,103] and Expokit. [104] Random numbers were generated with the 48 bit Linear Congruential Generator with Prime Addend, as implemented in SPRNG5.…”

Section: Computational Detailsmentioning

confidence: 99%

Chemical Transformations Approaching Chemical Accuracy via Correlated Sampling in Auxiliary-Field Quantum Monte Carlo

Shee

Zhang

Reichman

et al. 2017

J. Chem. Theory Comput.

View full text Add to dashboard Cite

The exact and phaseless variants of Auxiliary-Field Quantum Monte Carlo (AFQMC) have been shown to be capable of producing accurate ground-state energies for a wide variety of systems including those which exhibit substantial electron correlation effects. The computational cost of performing these calculations has to date been relatively high, impeding many important applications of these approaches. Here we present a correlated sampling methodology for AFQMC which relies on error cancellation to dramatically accelerate the calculation of energy differences of relevance to chemical transformations. In particular, we show that our correlated sampling-based AFQMC approach is capable of calculating redox properties, deprotonation free-energies, and hydrogen abstraction energies in an efficient manner without sacrificing accuracy. We validate the computational protocol by calculating the ionization potentials and electron affinities of the atoms contained in the G2 Test Set, and then proceed to utilize a composite method, which treats fixedgeometry processes with correlated sampling-based AFQMC and relaxation energies via MP2, to compute the ionization potential, deprotonation free-energy, and the O-H bond disocciation energy of methanol, all to within chemical accuracy. We show that the efficiency of correlated sampling relative to uncorrelated calculations increases with system and basis set size, and that correlated sampling greatly reduces the required number of random walkers to achieve a target statistical error. This translates to CPU-time speed-up factors of 55, 25, and 24 for the the ionization potential of the K atom, the deprotonation of methanol, and hydrogen abstraction from the O-H bond of methanol, respectively. We conclude with a discussion of further efficiency improvements that may open the door to the accurate description of chemical processes in complex systems.

show abstract

Section: Computational Detailsmentioning

confidence: 99%

Chemical Transformations Approaching Chemical Accuracy via Correlated Sampling in Auxiliary-Field Quantum Monte Carlo

Shee

Zhang

Reichman

et al. 2017

J. Chem. Theory Comput.

View full text Add to dashboard Cite

show abstract

“…It has been argued that empirical search is the only way to obtain highly optimized implementations for DLA operations [Demmel et al 2005;Bilmes et al 1997a;Whaley and Dongarra 1998], and an increasing number of recent projects (Build-To-Order BLAS [Belter et al 2010] and AuGEM [Wang et al 2013]) now adopt empirical search to identify optimal parameter values for DLA algorithms. The problem with empirical-based approaches is that they unleash a walloping search space, due to the combination of a large number of possible values for a substantial set of parameters.…”

Section: Approaches To Identify the Optimal Parameter Values For Gemmmentioning

confidence: 99%

“…We note with interest that OpenBLAS utilized an 8 × 2 micro-kernel for the AMD Kaveri. While it differs from our analytical 4 × 6 micro-kernel, we note that AuGEM [Wang et al 2013], an empirical search tool developed by the authors of OpenBLAS in order to automatically generate the micro-kernel, generated a micro-kernel that operates on a 6 × 4 micro-tile. This suggests that the original 8 × 2 micro-kernel currently used by OpenBLAS may not be optimal for the AMD architecture.…”

Section: Evaluating the Model For Mr And Nrmentioning

confidence: 99%

Analytical Modeling Is Enough for High-Performance BLIS

Low

Igual

Smith

et al. 2016

ACM Trans. Math. Softw.

125

115

View full text Add to dashboard Cite

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine optimal tuning parameters for high-end instantiations of the matrix-matrix multiplication. This is of both practical and scientific importance, as it greatly reduces the development effort required for the implementation of the level-3 BLAS while also advancing our understanding of how hierarchically layered memories interact with high performance software. This allows the community to move on from valuable engineering solutions (empirically autotuning) to scientific understanding (analytical insight).

show abstract

“…Even though the practical value of ω is larger than the asymptotic one, matrix multiplication routines have been the subject of intense implementation development for decades, and highly-tuned software is readily available for a variety of architectures [Dumas, Giorgi, andPernet, 2008, Wang, Zhang, Zhang, andYi, 2013]. Running times based on n ω have practical as well as theoretical significance; the ω indicates where an algorithm is able to take advantage of fast low-level matrix multiplication routines.…”

Section: Related Workmentioning

confidence: 99%

Error Correction in Fast Matrix Multiplication and Inverse

Roche

2018

Proceedings of the 2018 ACM International Symposium on Symbolic and Algebraic Computation

View full text Add to dashboard Cite

We present new algorithms to detect and correct errors in the product of two matrices, or the inverse of a matrix, over an arbitrary field. Our algorithms do not require any additional information or encoding other than the original inputs and the erroneous output. Their running time is softly linear in the number of nonzero entries in these matrices when the number of errors is sufficiently small, and they also incorporate fast matrix multiplication so that the cost scales well when the number of errors is large. These algorithms build on the recent result of Gasieniec, Levcopoulos, Lingas, Pagh, and Tokuyama [2017] on correcting matrix products, as well as existing work on verification algorithms, sparse low-rank linear algebra, and sparse polynomial interpolation. 14 for i Ð 1, 2, . . . , r do 15 Set pJ i , eqth entry of E to c for each term cx e of f i 16 J Ð FindNonzeroRowspV Þ Ñ pC´EqV´ApBV q, ǫ 1 q 17 if #J ą r{2 then 18 k Ð 2k 19 if k ě 2n#J then return C´AB 20 foreach i P J do 21 Clear entries from row i of E added on this iteration 22 return E 17 for i Ð 1, 2, . . . , r do 18 Set pJ i , eqth entry of E to c for each term cx e of f i 19 J Ð FindNonzeroRowspV Þ Ñ V´pB`EqpAV q, ǫ 1 q 20 if #J ą r{2 then 21 k Ð 2k 22 if k ě 2n#J then return A´1´B 23 foreach i P J do 24 Clear entries from row i of E added on this iteration 25 return E

show abstract

Augem

Cited by 137 publications

References 21 publications

Chemical Transformations Approaching Chemical Accuracy via Correlated Sampling in Auxiliary-Field Quantum Monte Carlo

Chemical Transformations Approaching Chemical Accuracy via Correlated Sampling in Auxiliary-Field Quantum Monte Carlo

Analytical Modeling Is Enough for High-Performance BLIS

Error Correction in Fast Matrix Multiplication and Inverse

Contact Info

Product

Resources

About