Speeding Up Stencil Computations with Kernel Convolution

Januario, Guilherme C.; Rosenburg, Bryan S.; Park, Yoonho; Perrone, Michael; Moreira, José E.; Carvalho, Tereza Cristina Melo de Brito

doi:10.1109/sbac-pad.2016.18

Cited by 1 publication

(3 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This example also highlights how the DCMI strategy differs from stencil computing strategies for CPUs and GPUs. One example is ASLI [18], which is similar to DCMI as it creates a new stencil operator that covers multiple time-steps by convolving the operator with itself. Although this approach enables data reuse within a cone, it suffers from the same redundant computation issue as CA, because it does not enable reuse between cones (see Figure 2).…”

Section: Why Does Dcmi Use Minimal Ocm?mentioning

confidence: 99%

“…A number of approaches that optimize ISLs for CPUs and GPUs combine computations from different loop levels to reduce the amount of redundant computation. ASLI [18] is an application-level technique that creates a new stencil operator that covers multiple time-steps by convolving the original stencil operator with itself two or more times. The compiler optimizations loop unrolling (e.g., References [22,45,61]) and forward substitution (e.g., Reference [24]) can be used to achieve similar gains.…”

Section: Cpus Gpus and Asic Acceleratorsmentioning

confidence: 99%

“…The ASLI[18] approach can be used to achieve the same result by mathematically convolving the stencil operator with itself D times and then computing the effective coefficients.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Dcmi

Koraei

Fatemi

Jahre

2019

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit parallelism (1) among elements within the same iteration and (2) across loop iterations. We propose a novel ISL acceleration scheme called Direct Computation of Multiple Iterations (DCMI) that improves upon prior work by pre-computing the effective stencil coefficients after a number of iterations at design time-resulting in accelerators that use minimal on-chip memory and avoid redundant computation. This enables DCMI to improve throughput by up to 7.7× compared to the state-of-the-art cone-based architecture. CCS Concepts: • Computer systems organization → Architectures; • Computing methodologies → Parallel computing methodologies;

show abstract

Section: Why Does Dcmi Use Minimal Ocm?mentioning

confidence: 99%