2013
DOI: 10.1007/978-3-642-36803-5_6
|View full text |Cite
|
Sign up to set email alerts
|

Multicore and Accelerator Development for a Leadership-Class Stellar Astrophysics Code

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 20 publications
0
13
0
Order By: Relevance
“…To exploit the parallelism and near-processor memory of the powerful high-performance computing systems, it is common in application programs to bind many small operations into a cumulatively large computation. This trend gives rise to the applications of batched matrix multiplications, which can be frequently found in, e.g., quantum chemistry [13], astrophysics [39], metabolic networks [32], computational fluid dynamics [44], domain decomposition solvers [10], tensor computations [45], and deep learning [6,11]. It has been proven that in these applications, the performance could be greatly improved by exploiting batched computations of small matrix multiplications [8,19,37].…”
Section: Introductionmentioning
confidence: 99%
“…To exploit the parallelism and near-processor memory of the powerful high-performance computing systems, it is common in application programs to bind many small operations into a cumulatively large computation. This trend gives rise to the applications of batched matrix multiplications, which can be frequently found in, e.g., quantum chemistry [13], astrophysics [39], metabolic networks [32], computational fluid dynamics [44], domain decomposition solvers [10], tensor computations [45], and deep learning [6,11]. It has been proven that in these applications, the performance could be greatly improved by exploiting batched computations of small matrix multiplications [8,19,37].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, there are good reasons to believe that neither improved compiler technology nor autotuning will make any significant headway on this problem. This lack of coverage by current library infrastructure is especially alarming because of the number of applications from important fields that fit this profile, including deep learning [8], data mining [31], astrophysics [23], image and signal processing [4], [24], hydrodynamics [10], quantum chemistry [5], and computational fluid dynamics (CFD) and the resulting partial differential equations (PDEs) through direct and multifrontal solvers [42], to name a few. Dramatically better performance on these applications can be achieved by using software that can repetitively execute small matrix/tensor operations grouped together in "batches."…”
Section: Introductionmentioning
confidence: 99%
“…Also, in combustion and astrophysics supernova applications [6], [7], [17], [23], [32], the study of a thermonuclear reaction networks (XNet package) requires the solution of many sparse linear systems of around 150 × 150. Furthermore, the need for batched routines can be illustrated in radar signal processing [4], where a batch of 200 × 200 QR decompositions is needed, as well as in hydrodynamic simulations [10], where thousands of matrix-matrix and matrix-vector (GEMV) products of matrices of around 100 × 100 are needed.…”
Section: Introductionmentioning
confidence: 99%
“…Many numerical libraries and applications already use, and need this functionality further developed. For example, these are the tile algorithms from the area of dense linear algebra [2], various register and cache blocking techniques for sparse computations [11], sparse direct multifrontal solvers [30], high-order FEM [7], and numerous applications, e.g., from astrophysics [17], hydrodynamics [7], image processing [18], signal processing [5], etc.…”
Section: Introductionmentioning
confidence: 99%