2013 IEEE 27th International Symposium on Parallel and Distributed Processing 2013
DOI: 10.1109/ipdps.2013.79
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing and Auto-Tuning Iterative Stencil Loops for GPUs with the In-Plane Method

Abstract: Abstract-Stencils represent an important class of computations that are used in many scientific disciplines. Increasingly, many of the stencil computations in scientific applications are being offloaded to GPUs to improve running times. Since a large part of the simulation time is spent inside the stencil kernels, optimizing the kernel is therefore important in the context of achieving greater computation efficiencies and reducing simulation time. In this work, we proposed a novel in-plane method for stencil c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…We also compare our results for the 3D stencil with the results reported in [10]. This work uses a different equation in which neighbor cells with the same distance from center share the same coefficient and hence, the FLOP per cell update for their stencil is lower.…”
Section: B Hardware and Software Setupmentioning
confidence: 78%
See 2 more Smart Citations
“…We also compare our results for the 3D stencil with the results reported in [10]. This work uses a different equation in which neighbor cells with the same distance from center share the same coefficient and hence, the FLOP per cell update for their stencil is lower.…”
Section: B Hardware and Software Setupmentioning
confidence: 78%
“…• We show that compared to the state-of-the-art YASK framework [9] on a modern Xeon and Xeon Phi Processor, and a recent GPU implementation [10], our FPGA implementation achieves better performance for 2D stencil computation, and competitive performance for 3D, and better power efficiency in almost all cases.…”
Section: Introductionmentioning
confidence: 90%
See 1 more Smart Citation
“…Some existing GPGPU auto-tuning works have used intelligent, nonexhaustive strategies [12-14, 20, 26]. However, all of this work (with the exception of Tang's which develops an analytical model to approximate performance [26]) focuses exclusively on a single optimization. In contrast, we consider multiple optimizations concurrently and our strategy takes into account the interactions amongst these optimizations.…”
Section: Related Workmentioning
confidence: 99%
“…In recent times, Machine Learning (ML)–based autotuning systems are employed to determine the parameter space. These systems use a limited set of conventional parameters such as block size, shared memory size, number of registers to identify optimal values for the parameter space.…”
Section: Introductionmentioning
confidence: 99%