A framework for enhancing data reuse via associative reordering

Stock, Kevin; Kong, Martin; Grosser, Tobias; Pouchet, Louis-Noël; Rastello, Fabrice; Ramanujam, J.; Sadayappan, P.

doi:10.1145/2594291.2594342

Cited by 43 publications

(46 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recently, extensions to the polyhedral framework have been proposed, allowing it to capture reduction computations [11,17,48]. Such efforts are described in [13], but they are fragile in the presence of non static control flow.…”

Section: Related and Future Workmentioning

confidence: 99%

Automatic Matching of Legacy Code to Heterogeneous APIs

Ginsbach

Remmelg

Steuwer

et al. 2018

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Syste

View full text Add to dashboard Cite

Heterogeneous accelerators often disappoint. They provide the prospect of great performance, but only deliver it when using vendor specific optimized libraries or domain specific languages. This requires considerable legacy code modifications, hindering the adoption of heterogeneous computing.This paper develops a novel approach to automatically detect opportunities for accelerator exploitation. We focus on calculations that are well supported by established APIs: sparse and dense linear algebra, stencil codes and generalized reductions and histograms. We call them idioms and use a custom constraint-based Idiom Description Language (IDL) to discover them within user code. Detected idioms are then mapped to BLAS libraries, cuSPARSE and clSPARSE and two DSLs: Halide and Lift.We implemented the approach in LLVM and evaluated it on the NAS and Parboil sequential C/C++ benchmarks, where we detect 60 idiom instances. In those cases where idioms are a significant part of the sequential execution time, we generate code that achieves 1.26× to over 20× speedup on integrated and external GPUs.CCS Concepts • Computer systems organization → Heterogeneous (hybrid) systems; • Software and its engineering → Domain specific languages; ACM Reference Format:

show abstract

Section: Related and Future Workmentioning

confidence: 99%

Automatic Matching of Legacy Code to Heterogeneous APIs

Ginsbach

Remmelg

Steuwer

et al. 2018

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Syste

View full text Add to dashboard Cite

show abstract

“…More recently, extensions to the polyhedral framework have been proposed, allowing it to capture some reduction computations [8,14,32]. Such efforts are described in [12].…”

Section: Related Workmentioning

confidence: 99%

Discovery and exploitation of general reductions: A constraint based approach

Ginsbach¹,

O’Boyle²

2017

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

View full text Add to dashboard Cite

Discovering and exploiting scalar reductions in programs has been studied for many years. The discovery of more complex reduction operations has, however, received less attention. Such reductions contain compile-time unknown parameters, indirect memory accesses and dynamic control flow, which are challenging for existing approaches.In this paper we develop a new compiler based approach that automatically detects a wide class of reductions. This is based on a constraint formulation of the reduction idiom and has been implemented as an LLVM pass. We use a custom constraint solver to identify program subsets that adhere to the constraint specification. Once discovered, we automatically generate parallel code to exploit the reduction.This approach is robust and was evaluated on C versions of well known benchmark suites: NAS, Parboil and Rodinia. We detected 84 scalar reductions and 6 histograms, outperforming existing approaches. We show that the exploitation of histograms gives significant performance improvement.

show abstract

“…This optimization exploits data reuse, thus improving memory hierarchy performance, while also eliminating redundant floating-point computation. Reducing floating-point computation is particularly valuable for large higher-order stencils, as they can be compute bound, and their computation can also stress the register capacity [15].…”

Section: Introductionmentioning

confidence: 99%

“…• We describe the partial sum optimization within the CHiLL compiler [18], which goes beyond related manual [19][20][21] and compiler optimizations [14,15] by simultaneously addressing DRAM and cache bandwidth while reducing floating-point computation and facilitating SIMDization.…”

Section: Introductionmentioning

confidence: 99%

Compiler-Directed Transformation for Higher-Order Stencils

Basu

Hall

Williams

et al. 2015

2015 IEEE International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-As the cost of data movement increasingly dominates performance, developers of finite-volume and finite-difference solutions for partial differential equations (PDEs) are exploring novel higher-order stencils that increase numerical accuracy and computational intensity. This paper describes a new compiler reordering transformation applied to stencil operators that performs partial sums in buffers, and reuses the partial sums in computing multiple results. This optimization has multiple effects on improving stencil performance that are particularly important to higher-order stencils: exploits data reuse, reduces floating-point operations, and exposes efficient SIMD parallelism to backend compilers. We study the benefit of this optimization in the context of Geometric Multigrid (GMG), a widely used method to solve PDEs, using four different Jacobi smoothers built from 7-, 13-, 27-and 125-point stencils. We quantify performance, speedup, and numerical accuracy, and use the Roofline model to qualify our results. Ultimately, we obtain over 4× speedup on the smoothers themselves and up to a 3× speedup on the multigrid solver. Finally, we demonstrate that high-order multigrid solvers have the potential of reducing total data movement and energy by several orders of magnitude.

show abstract

A framework for enhancing data reuse via associative reordering

Cited by 43 publications

References 41 publications

Automatic Matching of Legacy Code to Heterogeneous APIs

Automatic Matching of Legacy Code to Heterogeneous APIs

Discovery and exploitation of general reductions: A constraint based approach

Compiler-Directed Transformation for Higher-Order Stencils

Contact Info

Product

Resources

About