2019
DOI: 10.1007/978-3-030-34356-9_12
|View full text |Cite
|
Sign up to set email alerts
|

Batch Solution of Small PDEs with the OPS DSL

Abstract: In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5

Relationship

5
0

Authors

Journals

citations
Cited by 5 publications
(15 citation statements)
references
References 20 publications
0
15
0
Order By: Relevance
“…However, we note that OpenCL could equally be used to implement the same design. Finally, we compare performance on the FPGA to an NVIDIA Tesla V100 GPU using the tridiagonal solver library, tridsolver implemented by Lászl ó et al [13] [1] using its batched version presented by Reguly et al [22]. This GPU library has been shown [6] to provide matching or better performance than the two current batch tridiagonal solver functions -cusparse<t>gtsv2StridedBatch() and cusparse<t>gtsvInterleacedBatch(), in Nvidia's cuSPARSE library [4], [25].…”
Section: Performancementioning
confidence: 99%
“…However, we note that OpenCL could equally be used to implement the same design. Finally, we compare performance on the FPGA to an NVIDIA Tesla V100 GPU using the tridiagonal solver library, tridsolver implemented by Lászl ó et al [13] [1] using its batched version presented by Reguly et al [22]. This GPU library has been shown [6] to provide matching or better performance than the two current batch tridiagonal solver functions -cusparse<t>gtsv2StridedBatch() and cusparse<t>gtsvInterleacedBatch(), in Nvidia's cuSPARSE library [4], [25].…”
Section: Performancementioning
confidence: 99%
“…Regarding runtime performance we measured the performance of a stochastic local volatility (SLV) model ported to ops (see [14]). SLV constitute state-of-the-art models to describe asset price processes, notably foreign exchange rates.…”
Section: Resultsmentioning
confidence: 99%
“…Thus if a large number of smaller meshes are to be solved, as is the case in financial applications [27], then processing one mesh at a time incurs significant latencies. This motivates the idea of grouping together meshes with the same dimensions in batches, increasing the overall throughput of the solve.…”
Section: B Batchingmentioning
confidence: 99%
“…Baseline FPGA performance is significantly better than on the V100, since the GPU is not saturated by this application. The batching of 2D meshes as in [27] improves GPU performance significantly and offers a closer comparison. The FPGA achieves a maximum speedup of about 30-34% for different mesh sizes and batching sizes of 100 (100B) and 1000 (1000B).…”
Section: A Poisson-5pt-2dmentioning
confidence: 99%
See 1 more Smart Citation