2014
DOI: 10.1016/j.cam.2013.09.001
|View full text |Cite
|
Sign up to set email alerts
|

Architecting the finite element method pipeline for the GPU

Abstract: The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 53 publications
(20 citation statements)
references
References 30 publications
0
20
0
Order By: Relevance
“…Since the computational intensity (the ratio of mathematical operations to size of input data) for the integral calculations is fairly high, a significant speedup should be achievable (Wolters et al, 2002; Ataseven et al, 2008). In the literature, an 87× speed-up has been reported using a FEM GPU implementation (Fu et al, 2014). A comparable speed-up applied here would reduce the computation time required for a single iteration to only 7.5 minutes, and could give SCALE convergence in 1 hour or less.…”
Section: Resultsmentioning
confidence: 99%
“…Since the computational intensity (the ratio of mathematical operations to size of input data) for the integral calculations is fairly high, a significant speedup should be achievable (Wolters et al, 2002; Ataseven et al, 2008). In the literature, an 87× speed-up has been reported using a FEM GPU implementation (Fu et al, 2014). A comparable speed-up applied here would reduce the computation time required for a single iteration to only 7.5 minutes, and could give SCALE convergence in 1 hour or less.…”
Section: Resultsmentioning
confidence: 99%
“…The principal disadvantage of these strategies relies on the fact that some preprocessing is required, which is time consuming and hardly parallelizable. Recent studies [28] have proposed a compact sparse-matrix data structure and an agglomeration strategy for the assembly step, which is based on atomic addition operations in the device memory. This strategy permits to reduce the memory footprint and avoid the preprocessing.…”
Section: Previous Workmentioning
confidence: 99%
“…Instead they use a duplication method similar to that of LULESH and miniAero, as described below. Fu et al [23] also create contiguous patches (blocks) in the mesh to be loaded into shared memory, although they partition the nodes (the to-set) but not the elements (the from-set of the mapping). Furthermore, they do not load all data into shared memory, only what is inside the patch.…”
Section: Related Workmentioning
confidence: 99%