Flextended Tiles

Zhao, Jie; Cohen, Albert

doi:10.1145/3369382

Cited by 11 publications

(3 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additional pre/post-processing is limited to point (pixel-wise) operators. Polyhedral optimization must be considered [38], [39] for extending to stencil or areawise operations.…”

Section: ) Post-processingmentioning

confidence: 99%

SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for Halide

Kanetaka,

Takagi,

Maeda

et al. 2024

IEEE Access

View full text Add to dashboard Cite

“…Additional pre/post-processing is limited to point (pixel-wise) operators. Polyhedral optimization must be considered [38], [39] for extending to stencil or areawise operations.…”

Section: ) Post-processingmentioning

confidence: 99%

SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for Halide

Kanetaka,

Takagi,

Maeda

et al. 2024

IEEE Access

View full text Add to dashboard Cite

“…In the case of iterative stencils we use the problem size of the outermost space dimension. As an illustration, given a problem size of 1024 and a 2D dominating array, the upper bound produced would be loд 1.15 (64).…”

Section: Modeling Resource Constraintsmentioning

confidence: 99%

“…Other works such as Flextended Tiles [64] improve on the traditional overlapped tiles [37] with the goal of reducing redundant computations. Previously, split tiling [20] and hexagonal tiling [19] were also used in multiple automated code generators, in particular, for iterative stencil computations, to enhance the parallelism.…”

Section: Related Workmentioning

confidence: 99%

Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation

Abdelaal

Kong

2021

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Loop tiling is a key high-level transformation which is known to maximize locality in loop intensive programs. It has been successfully applied to a number of applications including tensor contractions, iterative stencils and machine learning. This technique has also been extended to a wide variety of computational domains and architectures. The performance achieved with this critical transformation largely depends on a set of inputs given, the tile sizes, due to the complex trade-off between locality and parallelism. This problem is exacerbated in GPGPU architectures due to limited hardware resources such as the available shared-memory.In this paper we present a new technique to compute resource conscious tile sizes for affine programs. We use Integer Linear Programming (ILP) constraints and objectives in a cross-compiler fashion to faithfully and effectively mimic the transformations applied in a polyhedral GPU compiler (PPCG). Our approach significantly reduces the need for experimental auto-tuning by generating only two tile size configurations that achieve strong out-of-the-box performance. We evaluate the effectiveness of our technique using the Polybench benchmark suite on two GPGPUs, an AMD Radeon VII and an NVIDIA Tesla V100, using OpenCL and CUDA programming models. Experimental validation reveals that our approach achieves nearly 75% of the best empirically found tile configuration across both architectures. CCS CONCEPTS• Software and its engineering → Compilers; • General and reference → Performance; • Mathematics of computing → Combinatorial optimization; • Computer systems organization → Parallel architectures.

show abstract