TuningGenie: Auto-Tuning Framework Based on Rewriting Rules

Ivanenko, P.A.; Doroshenko, Anatoliy; Zhereb, Kostiantyn

doi:10.1007/978-3-319-13206-8_7

Cited by 13 publications

(1 citation statement)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Standard approaches for cuBLAS matrix multiplications optimization are described in [5], advanced autotuning result is presented in [6]. The autotuning approach [7] has many pros but a kind of contra -the tuned library is fast "in common sense" as the target parameter is time, and sometimes additional resource restrictions apply (e.g. less time -more memory or software depends on "hot cache") and a library may use very specific dataset, and this is common for the most of computational tasks -Fourier transforms have only one or two used dimensions, matrices have only several fixed dimensions.…”

Section: Software Performance Modelingmentioning

confidence: 99%

Performance analysis of massively parallel programs for graphics processing units

Rahozin

2022

View full text Add to dashboard Cite

Any modern Graphics Processing Unit (graphics card) is a good platform to run massively parallel programs. Still, we lack tools to observe and measure performance characteristics of GPU-based software. We state that due to complex memory hierarchy and thou- sands of execution threads the all performance issues are about efficient use of graphics card memory hierarchy. We propose to use GPGPUSim simulator, previously used mostly for graphics card architecture validation, for performance validation for CUDA-based program. We provide examples which show how to use the simulation for performance analysis of massively parallel programs.

show abstract

Section: Software Performance Modelingmentioning

confidence: 99%