SC18: International Conference for High Performance Computing, Networking, Storage and Analysis 2018
DOI: 10.1109/sc.2018.00049
|View full text |Cite
|
Sign up to set email alerts
|

Associative Instruction Reordering to Alleviate Register Pressure

Abstract: Register allocation is generally considered a practically solved problem. For most applications, the register allocation strategies in production compilers are very effective in controlling the number of loads/stores and register spills. However, existing register allocation strategies are not effective and result in excessive register spilling for computation patterns with a high degree of many-to-many data reuse, e.g., high-order stencils and tensor contractions. We develop a source-to-source instruction reo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…The baseline performance is on-par to (and often exceeds) state-of-the-art GPUoptimized stencil codes reporting the highest performance across different stencil benchmarks. Namely, SSAM [5], register-optimized stencils [59], [60], StencilGen [40], and temporal blocking AN5D [4].…”
Section: B Benchmarks and Datasets 1) Stencil Benchmarksmentioning
confidence: 99%
“…The baseline performance is on-par to (and often exceeds) state-of-the-art GPUoptimized stencil codes reporting the highest performance across different stencil benchmarks. Namely, SSAM [5], register-optimized stencils [59], [60], StencilGen [40], and temporal blocking AN5D [4].…”
Section: B Benchmarks and Datasets 1) Stencil Benchmarksmentioning
confidence: 99%
“…Data reuse has also been extensively recognized and exploited. Prior work [33,34,39,54] on optimizing the order of execution instructions could decrease loads/stores operations to relieve the register pressure, while only the individual element in each vector could be reused. Basu designs a vector code generation scheme to reuse several vectors in the computation process, and it is constrained to constantcoefficient and isotropic stencils [6].…”
Section: Related Workmentioning
confidence: 99%
“…The first one is based on the associativity of the weighted sums of neighboring points. Specifically, the execution order of one stencil computation can be rearranged to exploit common subexpressions or data reuse at register or cache level [6,12,33,34,54]. Consequently, the number of load/store operations can be reduced and the bandwidth usage is alleviated in optimized execution order.…”
Section: Introductionmentioning
confidence: 99%
“…Register allocation is generally considered a practically solved problem [15]. For most applications, the register allocation strategy in the production compiler is very effective in controlling the number of loads/stores and register overflows.…”
Section: Instruction Reorderingmentioning
confidence: 99%