2014 International Conference on High Performance Computing &Amp; Simulation (HPCS) 2014
DOI: 10.1109/hpcsim.2014.6903670
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of classic algorithms on GPUs

Abstract: Abstract-The recently developed Threaded Many-core Memory (TMM) model provides a framework for analyzing algorithms for highly-threaded many-core machines such as GPUs. In particular, it tries to capture the fact that these machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The TMM model analysis contains two components: computational complexity and memory complexity.A model is only useful if it can explain and predict empirical data. In this work, we investigat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 27 publications
(39 reference statements)
0
3
0
Order By: Relevance
“…Theoretically, as long as n K = O(n G log n G ), the sequential time complexity is O(n T n R n G log n G ). We achieve near-linear speed-up by distributing the work on multicore CPUs/GPUs, but the precise complexity analysis depends on the GPU architecture [54]. If we use the same resolution to resolve the cutter as the grid size for convolution, then n K n G since the cutter is typically much smaller in size than the design domain.…”
Section: Discussionmentioning
confidence: 99%
“…Theoretically, as long as n K = O(n G log n G ), the sequential time complexity is O(n T n R n G log n G ). We achieve near-linear speed-up by distributing the work on multicore CPUs/GPUs, but the precise complexity analysis depends on the GPU architecture [54]. If we use the same resolution to resolve the cutter as the grid size for convolution, then n K n G since the cutter is typically much smaller in size than the design domain.…”
Section: Discussionmentioning
confidence: 99%
“…As to asymptotic models, Ma et al [10] designed the Threaded Multi-core Memory (TMM) model, in which a number of classic algorithms are analyzed in terms of both their computational complexity and their memory complexity assuming perfect scheduling [11], [12]. Kirtzic et al [13] proposed the Parallel GPU Model (PGM), which is essentially an adaption of the Bulk-Synchronous Parallel (BSP) model [14], and equates a superstep in BSP with a function unit of a GPU program.…”
Section: Introductionmentioning
confidence: 99%
“…13 Utilizing shared memory on GPU would be beneficial in terms of processing speed in some applications. 17,18 However, because of the size restriction of shared memory, it is not feasible to utilize such memory in APC algorithms implementations. As described in Secs.…”
Section: Memory Space Usagementioning
confidence: 99%