Automatic Energy Efficient Parallelization of Uniform Dependence Computations

Zou, Yun; Rajopadhye, Sanjay

doi:10.1145/2751205.2751245

Cited by 3 publications

(4 citation statements)

References 31 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also provide variability of both speed and cache behavior with respect to tile size. Some prior work on code generators for various types of tiling have empirically compared iteration space tiling and cache-oblivious methods [2,11,44,58,73]. The objective of these experiments are slightly different from ours; the main target of evaluation is the code generation tools.…”

Section: Empirical Study Of Cache-oblivious Methodsmentioning

confidence: 99%

PCOT: Cache Oblivious Tiling of Polyhedral Programs

Ranasinghe,

Prajapati,

Yuki

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

This paper studies two variants of tiling: iteration space tiling (or loop blocking) and cache oblivious methods that recursively split the iteration space with divide-and-conquer. The key question to answer is when we should be using one over the other. The answer to this question is complicated for modern architecture due to a number of reasons.In this paper, we present a detailed empirical study to answer this question for a range of kernels that fit the polyhedral model. Our study is based on a generalized cache oblivious code generator that support this class, which is a superset of those supported by existing tools. The conclusion is that cache oblivious code is most useful when the aim is to have reduced off-chip memory accesses, e.g., lower energy, albeit certain situations that diminish its effectiveness exist.

show abstract

Section: Empirical Study Of Cache-oblivious Methodsmentioning

confidence: 99%

PCOT: Cache Oblivious Tiling of Polyhedral Programs

Ranasinghe,

Prajapati,

Yuki

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Lastly, our runtime scheduling policy mandates that each processor must follow a strictly lexicographically ascending order within the block and finish all of the work within a block before being preempted. Such policy essentially guarantees a multi-pass execution of the iteration space, which was previously proven [57] to exhibit energy efficiency but was only applicable to stencil kernels.…”

Section: Experimental Evaluationmentioning

confidence: 99%

“…As many authors note, such static control structures have a number of drawbacks [3,7,8,16,30,57]. First, they induce unnecessary synchronization-any tile of wavefront w must wait for all tiles of wavefront w − 1.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hybrid Static/Dynamic Schedules for Tiled Polyhedral Programs

Jin,

Prajapati,

Ranasinghe

et al. 2016

Preprint

Self Cite

View full text Add to dashboard Cite

Polyhedral compilers perform optimizations such as tiling and parallelization; when doing both, they usually generate code that executes "barrier-synchronized wavefronts" of tiles. We present a system to express and generate code for hybrid schedules, where some constraints are automatically satisfied through the structure of the code, and the remainder are dynamically enforced at run-time with data flow mechanisms. We prove bounds on the added overheads that are better, by at least one polynomial degree, than those of previous techniques.We propose a generic mechanism to implement the needed synchronization, and show it can be easily realized for a variety of targets: OpenMP, Pthreads, GPU (CUDA or OpenCL) code, languages like X10, Habanero, Cilk, as well as data flow platforms like DAGuE, and OpenStream and MPI. We also provide a simple concrete implementation that works without the need of any sophisticated run-time mechanism.Our experiments show our simple implementation to be competitive or better than the wavefront-synchronized code generated by other systems. We also show how the proposed mechanism can achieve 24% to 70% reduction in energy.

show abstract

A Code Generator for Energy-Efficient Wavefront Parallelization of Uniform Dependence Computations

Zou

Rajopadhye

2018

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Automatic Energy Efficient Parallelization of Uniform Dependence Computations

Cited by 3 publications

References 31 publications

PCOT: Cache Oblivious Tiling of Polyhedral Programs

PCOT: Cache Oblivious Tiling of Polyhedral Programs

Hybrid Static/Dynamic Schedules for Tiled Polyhedral Programs

A Code Generator for Energy-Efficient Wavefront Parallelization of Uniform Dependence Computations

Contact Info

Product

Resources

About