Casper: Accelerating Stencil Computation using Near-cache Processing

Denzler, Alain; Bera, Rahul; Hajinazar, Nastaran; Singh, Gagandeep; F., Oliveira, Geraldo; Gómez-Luna, Juan; Mutlu, Onur

doi:10.48550/arxiv.2112.14216

Cited by 5 publications

(5 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PrIM is opensource and publicly available at [168]. Unlike these prior works, DAMOV is applicable to and can be used to study other PIM architectures than processing-in/-near DRAM, including processing-in/-near cache [68,[93][94][95][169][170][171], processing-in/-near storage [40,[172][173][174][175][176][177][178][179][180][181], and processing-in/-near emerging NVMs [81,82,90,91,100,182,183]. This is possible since DAMOV's methodology and benchmarks are mainly concerned with broadly characterizing data movement bottlenecks in an application, independent of the underlying PIM architecture.…”

Section: Discussionmentioning

confidence: 99%

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

F.¹,

Gómez-Luna²,

Ghose³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

F.¹,

Gómez-Luna²,

Ghose³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

“…Naively employing PIM to accelerate data-intensive workloads can lead to sub-optimal performance due to the many design constraints PIM substrates impose (e.g., limited area and power budget available inside 3D-stacked memories [6] or manufacturing limitations of combining memory and logic elements [6,13]). Therefore, many recent works co-design specialized PIM accelerators and algorithms to improve performance and reduce the energy consumption of (i) applications from various application domains, such as graph processing , machine learning [1,, bioinformatics , high-performance computing [95,[101][102][103][104][105][106][107][108][109][110][111][112], databases [18,19,29,46,60,[113][114][115][116][117][118][119][120][121][122][123][124][125][126][127][128][129][130], security [131][132][133][134][135...…”

Section: Motivation and Problemmentioning

confidence: 99%

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

Oliveira¹,

Boroumand²,

Ghose³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

“…Though the applications are diverse in scientific computing, analogy to AI, there are several common and performance-critical operations in scientific computing, named Dwarf, defined by the Berkeley View [63]. As one of the seven computational Dwarfs, Stencil is ubiquitously involved in various scientific computing [14], which lies at the heart of thermal diffusion (∼100%), earth system model (>90%), and earthquake prediction model (>90%), etc [34,59,8,40,60,15,12].…”

Section: Introductionmentioning

confidence: 99%

Gamify Stencil Dwarf on Cloud for Democratizing Scientific Computing

Li¹,

Li²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

Stencil computation is one of the most important kernels in various scientific computing. Nowadays, most Stencil-driven scientific computing still relies heavily on supercomputers, suffering from expensive access, poor scalability, and duplicated optimizations.This paper proposes Tetris, the first system for highperformance Stencil on heterogeneous CPU+GPU, towards democratizing Stencil-driven scientific computing on Cloud. In Tetris, polymorphic tiling tetrominoes are first proposed to bridge different hardware architectures and various application contexts with a perfect spatial and temporal tessellation automatically. Tetris is contributed by three main components: (1) Underlying hardware characteristics are first captured to achieve a sophisticated Pattern Mapping by register-level tetrominoes;(2) An efficient Locality Enhancer is first presented for data reuse on spatial and temporal dimensions simultaneously by cache/SMEM-level tetrominoes; (3) A novel Concurrent Scheduler is first designed to exploit the full potential of on-cloud memory and computing power by memory-level tetrominoes. Tetris is orthogonal to (and complements) the optimizations or deployments for a wide variety of emerging and legacy scientific computing applications. Results of thermal diffusion simulation demonstrate that the performance is improved by 29.6×, reducing time cost from day to hour, while preserving the original accuracy.

show abstract

Casper: Accelerating Stencil Computation using Near-cache Processing

Cited by 5 publications

References 29 publications

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

Gamify Stencil Dwarf on Cloud for Democratizing Scientific Computing

Contact Info

Product

Resources

About