2020
DOI: 10.48550/arxiv.2010.03660
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast Stencil-Code Computation on a Wafer-Scale Processor

Abstract: The performance of CPU-based and GPUbased systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between nodes. Here we describe the solution of such systems of equations on the Cerebras Systems CS-1, a wafer-scale processor that has the memory bandwidth and communication latency to perform well. We achieve 0.86 PFLOPS on a single wafer-scale system for the solut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…From the cyber-infrastructure side, next-generation AI/ML acceleration hardware continuously evolve to tackle the scalability issue. For example, a recent pilot study in computational fluid dynamics showed that it could be more than 200 times faster than the same workload on an optimized number of cores on the NETL's supercomputer JOULE 2.0 [Rocki et al, 2020]. Similar scaling performance has been reported on other exascale computing clusters involving hundreds of GPU's [Byna et al, 2020].…”
Section: Model Scalibilitymentioning
confidence: 70%
“…From the cyber-infrastructure side, next-generation AI/ML acceleration hardware continuously evolve to tackle the scalability issue. For example, a recent pilot study in computational fluid dynamics showed that it could be more than 200 times faster than the same workload on an optimized number of cores on the NETL's supercomputer JOULE 2.0 [Rocki et al, 2020]. Similar scaling performance has been reported on other exascale computing clusters involving hundreds of GPU's [Byna et al, 2020].…”
Section: Model Scalibilitymentioning
confidence: 70%
“…For example, the Lassen system at Lawrence Livermore National Labs has a Cerebras accelerator integrated with it. Recent results show a 0.86 PFLOPS on a single wafer scale chip [8] on stencil problems. Graphcore IPUs and SambaNova are starting to be use in traditional HPC applications [9], [10].…”
Section: Introductionmentioning
confidence: 99%