2009
DOI: 10.1016/j.jpdc.2009.01.006
|View full text |Cite
|
Sign up to set email alerts
|

Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
103
0
7

Year Published

2010
2010
2020
2020

Publication Types

Select...
10

Relationship

1
9

Authors

Journals

citations
Cited by 156 publications
(116 citation statements)
references
References 21 publications
0
103
0
7
Order By: Relevance
“…Another important property of the SEM is the fact that it can be parallelized efficiently to take advantage of the distributed structure of modern supercomputers [33], and in particular on clusters of Graphics Processing Units (GPU) graphics cards [34][35][36], reaching speedup factors of more than an order of magnitude compared to a reference serial implementation on a CPU core; this makes it compare well in terms of performance to less flexible algorithms such as finite differences in the time domain (FDTD), which can also be implemented efficiently on GPUs [37,38].…”
Section: The Spectral-element Methodsmentioning
confidence: 99%
“…Another important property of the SEM is the fact that it can be parallelized efficiently to take advantage of the distributed structure of modern supercomputers [33], and in particular on clusters of Graphics Processing Units (GPU) graphics cards [34][35][36], reaching speedup factors of more than an order of magnitude compared to a reference serial implementation on a CPU core; this makes it compare well in terms of performance to less flexible algorithms such as finite differences in the time domain (FDTD), which can also be implemented efficiently on GPUs [37,38].…”
Section: The Spectral-element Methodsmentioning
confidence: 99%
“…Because this is a more general problem, several studies focused on the performance evaluation of this operation on the GPU [4,3,27] and in the FEM context [13]. Finite element assembly has also been investigated both in special cases [4,17,16,12] and in more general cases to show the alternative approaches to matrix assembly for more general problems [6,18,11,23]. Their results show a speedup of 10 to 50 compared to single thread CPU performance in the assembly phase, but only very limited speedup in the iterative solution phase.…”
Section: Related Workmentioning
confidence: 99%
“…Komatitsch et al [34] ported an unstructured mesh model of the earth to CUDA; their approach was to use high-order spectral elements of five nodes in each of three dimensions. The resulting 125-node elements fitted well within a block of 128 threads, which could then be efficiently arranged in memory; while this is an elegant solution it is unlikely to be practical for the general case, particularly when considering lower order problems.…”
Section: Introductionmentioning
confidence: 99%