Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA

Komatitsch, Dimitri; Michéa, David; Erlebacher, Gordon

doi:10.1016/j.jpdc.2009.01.006

Cited by 156 publications

(116 citation statements)

References 21 publications

Supporting

Mentioning

103

Contrasting

Unclassified

Order By: Relevance

“…Another important property of the SEM is the fact that it can be parallelized efficiently to take advantage of the distributed structure of modern supercomputers [33], and in particular on clusters of Graphics Processing Units (GPU) graphics cards [34][35][36], reaching speedup factors of more than an order of magnitude compared to a reference serial implementation on a CPU core; this makes it compare well in terms of performance to less flexible algorithms such as finite differences in the time domain (FDTD), which can also be implemented efficiently on GPUs [37,38].…”

Section: The Spectral-element Methodsmentioning

confidence: 99%

Elastic surface waves in crystals – Part 2: Cross-check of two full-wave numerical modeling methods

et al. 2011

Self Cite

View full text Add to dashboard Cite

Nathalie Favretto-Cristini. Elastic surface waves in crystals -part 2: cross-check of two full-wave numerical modeling methods. Ultrasonics, Elsevier, 2011, 51 (8) AbstractWe obtain the full-wave solution for the wave propagation at the surface of anisotropic media using two spectral numerical modeling algorithms. The simulations focus on media of cubic and hexagonal symmetries, for which the physics has been reviewed and clarified in a companion paper. Even in the case of homogeneous media, the solution requires the use of numerical methods because the analytical Green's function cannot be obtained in the whole space. The algorithms proposed here allow for a general material variability and the description of arbitrary crystal symmetry at each grid point of the numerical mesh. They are based on high-order spectral approximations of the wave field for computing the spatial derivatives. We test the algorithms by comparison to the analytical solution and obtain the wave field at different faces (stress-free surfaces) of apatite, zinc and copper. Finally, we perform simulations in heterogeneous media, where no analytical solution exists in general, showing that the modeling algorithms can handle large impedance variations at the interface.

show abstract

Section: The Spectral-element Methodsmentioning

confidence: 99%

Elastic surface waves in crystals – Part 2: Cross-check of two full-wave numerical modeling methods

et al. 2011

Self Cite

View full text Add to dashboard Cite

show abstract

“…Because this is a more general problem, several studies focused on the performance evaluation of this operation on the GPU [4,3,27] and in the FEM context [13]. Finite element assembly has also been investigated both in special cases [4,17,16,12] and in more general cases to show the alternative approaches to matrix assembly for more general problems [6,18,11,23]. Their results show a speedup of 10 to 50 compared to single thread CPU performance in the assembly phase, but only very limited speedup in the iterative solution phase.…”

Section: Related Workmentioning

confidence: 99%

Finite Element Algorithms and Data Structures on Graphical Processing Units

Reguly

Giles²

2013

Int J Parallel Prog

View full text Add to dashboard Cite

The finite element method (FEM) is one of the most commonly used techniques for the solution of partial differential equations on unstructured meshes. This paper discusses both the assembly and the solution phases of the FEM with special attention to the balance of computation and data movement. We present a GPU assembly algorithm that scales to arbitrary degree polynomials used as basis functions, at the expense of redundant computations. We show how the storage of the stiffness matrix affects the performance of both the assembly and the solution. We investigate two approaches: global assembly into the CSR and ELLPACK matrix formats and matrix-free algorithms, and show the trade-off between the amount of indexing data and stiffness data. We discuss the performance of different approaches in light of the implicit caches on Fermi GPUs and show a speedup over a twosocket 12-core CPU of up to 10 times in the assembly and up to 6 times in the solution phase. We present our sparse matrix-vector multiplication algorithms that are part of a conjugate gradient iteration and show that a matrix-free approach may be up to two times faster than global assembly approaches and up to 4 times faster than NVIDIA's cuSPARSE library, depending on the preconditioner used.

show abstract

“…Komatitsch et al [34] ported an unstructured mesh model of the earth to CUDA; their approach was to use high-order spectral elements of five nodes in each of three dimensions. The resulting 125-node elements fitted well within a block of 128 threads, which could then be efficiently arranged in memory; while this is an elegant solution it is unlikely to be practical for the general case, particularly when considering lower order problems.…”

Section: Introductionmentioning

confidence: 99%