Architecting the finite element method pipeline for the GPU

Fu, Zhisong; Lewis, T. James; Kirby, Robert M.; Whitaker, Ross T.

doi:10.1016/j.cam.2013.09.001

Cited by 53 publications

(20 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the computational intensity (the ratio of mathematical operations to size of input data) for the integral calculations is fairly high, a significant speedup should be achievable (Wolters et al, 2002; Ataseven et al, 2008). In the literature, an 87× speed-up has been reported using a FEM GPU implementation (Fu et al, 2014). A comparable speed-up applied here would reduce the computation time required for a single iteration to only 7.5 minutes, and could give SCALE convergence in 1 hour or less.…”

Section: Resultsmentioning

confidence: 99%

Simultaneous head tissue conductivity and EEG source location estimation

Acar

Makeig

2016

NeuroImage

View full text Add to dashboard Cite

Accurate electroencephalographic (EEG) source localization requires an electrical head model incorporating accurate geometries and conductivity values for the major head tissues. While consistent conductivity values have been reported for scalp, brain, and cerebrospinal fluid, measured brain-to-skull conductivity ratio (BSCR) estimates have varied between 8 and 80, likely reflecting both inter-subject and measurement method differences. In simulations, mis-estimation of skull conductivity can produce source localization errors as large as 3 cm. Here, we describe an iterative gradient-based approach to Simultaneous tissue Conductivity And source Location Estimation (SCALE). The scalp projection maps used by SCALE are obtained from near-dipolar effective EEG sources found by adequate independent component analysis (ICA) decomposition of sufficient high-density EEG data. We applied SCALE to simulated scalp projections of 15 cm2-scale cortical patch sources in an MR image-based electrical head model with simulated BSCR of 30. Initialized either with a BSCR of 80 or 20, SCALE estimated BSCR as 32.6. In Adaptive Mixture ICA (AMICA) decompositions of (45-min, 128-channel) EEG data from two young adults we identified sets of 13 independent components having near-dipolar scalp maps compatible with a single cortical source patch. Again initialized with either BSCR 80 or 25, SCALE gave BSCR estimates of 34 and 54 for the two subjects respectively. The ability to accurately estimate skull conductivity non-invasively from any well-recorded EEG data in combination with a stable and non-invasively acquired MR imaging-derived electrical head model could remove a critical barrier to using EEG as a sub-cm2-scale accurate 3-D functional cortical imaging modality.

show abstract

Section: Resultsmentioning

confidence: 99%

Simultaneous head tissue conductivity and EEG source location estimation

Acar

Makeig

2016

NeuroImage

View full text Add to dashboard Cite

show abstract

“…The principal disadvantage of these strategies relies on the fact that some preprocessing is required, which is time consuming and hardly parallelizable. Recent studies [28] have proposed a compact sparse-matrix data structure and an agglomeration strategy for the assembly step, which is based on atomic addition operations in the device memory. This strategy permits to reduce the memory footprint and avoid the preprocessing.…”

Section: Previous Workmentioning

confidence: 99%

Efficient matrix-free GPU implementation of Fixed Grid Finite Element Analysis

Martínez-Frutos

Herrero-Pérez

2015

Finite Elements in Analysis and Design

View full text Add to dashboard Cite

“…Instead they use a duplication method similar to that of LULESH and miniAero, as described below. Fu et al [23] also create contiguous patches (blocks) in the mesh to be loaded into shared memory, although they partition the nodes (the to-set) but not the elements (the from-set of the mapping). Furthermore, they do not load all data into shared memory, only what is inside the patch.…”

Section: Related Workmentioning

confidence: 99%

Locality optimized unstructured mesh algorithms on GPUs

Sulyok

Balogh

Reguly

et al. 2019

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms form an important class of applications for many scientific and engineering domains. The key difficulty in achieving higher performance from these applications is the indirect accesses that lead to data-races when parallelized. Current methods for handling such data-races lead to reduced parallelism and suboptimal performance. Particularly on modern many-core architectures, such as GPUs, that has increasing core/thread counts, reducing data movement and exploiting memory locality is vital for gaining good performance.In this work we present novel locality-exploiting optimizations for the efficient execution of unstructured-mesh algorithms on GPUs. Building on a two-layered coloring strategy for handling data races, we introduce novel reordering and partitioning techniques to further improve efficient execution. The new optimizations are then applied to several well established unstructuredmesh applications, investigating their performance on NVIDIA's latest P100 and V100 GPUs. We demonstrate significant speedups (1.1-1.75×) compared to the state-of-the-art. A range of performance metrics are benchmarked including runtime, memory transactions, achieved bandwidth performance, GPU occupancy and data reuse factors and are used to understand and explain the key factors impacting performance. The optimized algorithms are implemented as an open-source software library and we illustrate its use for improving performance of existing or new unstructured-mesh applications.

show abstract

Architecting the finite element method pipeline for the GPU

Cited by 53 publications

References 30 publications

Simultaneous head tissue conductivity and EEG source location estimation

Simultaneous head tissue conductivity and EEG source location estimation

Efficient matrix-free GPU implementation of Fixed Grid Finite Element Analysis

Locality optimized unstructured mesh algorithms on GPUs

Contact Info

Product

Resources

About