2008
DOI: 10.1016/j.jpdc.2008.05.008
|View full text |Cite
|
Sign up to set email alerts
|

Algorithmic performance studies on graphics processing units

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0
1

Year Published

2009
2009
2017
2017

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(16 citation statements)
references
References 15 publications
0
15
0
1
Order By: Relevance
“…While GPUs can partially hide the off-loading overhead with asynchronous data transfer (i.e., double-buffering), this mechanism currently works only for page-locked memory and incurs additional programming overhead [20]. To amortize the off-loading overhead, GPUs require higher computational intensity than other processors [6,28,16]. However, the Tesla C1060's on-board memory is much larger (4 GB) than the Harpertown or Barcelona's cache memory (12 or 2 MB) or the Cell/B.E.…”
Section: Start-up Overheadmentioning
confidence: 99%
“…While GPUs can partially hide the off-loading overhead with asynchronous data transfer (i.e., double-buffering), this mechanism currently works only for page-locked memory and incurs additional programming overhead [20]. To amortize the off-loading overhead, GPUs require higher computational intensity than other processors [6,28,16]. However, the Tesla C1060's on-board memory is much larger (4 GB) than the Harpertown or Barcelona's cache memory (12 or 2 MB) or the Cell/B.E.…”
Section: Start-up Overheadmentioning
confidence: 99%
“…On current processors, with several levels of cache memory, it is possible to carefully orchestrate the memory accesses for these type of operations achieving high performance. A few studies on modern GPUs [8], [7], [9] show how, for this type of operations, these hardware accelerators can deliver up to 10× speed-ups compared with highly tuned implementations on a general-purpose processor, even taking into account the overhead introduced by the data transfers through the PCI-Express bus.…”
Section: A Flame Methodology: Algorithmic Variantsmentioning
confidence: 99%
“…Only implicit schemes are considered as system solving strategy, with performance comparison of various linear system solvers (both direct and iterative). Other extensive studies on the performances of different linear solvers have been carried out for example in [16,17,18,19]. On the other hand, only few GPU implementations of FEM in explicit dynamics are available in the literature.…”
Section: Introductionmentioning
confidence: 99%