2008
DOI: 10.1145/1556444.1556447
|View full text |Cite
|
Sign up to set email alerts
|

On sorting and load balancing on GPUs

Abstract: In this paper we take a look at GPU-Quicksort, an efficient Quicksort algorithm suitable for the highly parallel multi-core graphics processors. Quicksort had previously been considered an inefficient sorting solution for graphics processors, but GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors. We als… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
23
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 23 publications
(24 citation statements)
references
References 8 publications
1
23
0
Order By: Relevance
“…Other approaches for work distributions on GPUs have explored lock‐free work queues and work stealing approaches [CT08]. We have found that these methods work well as long as parallel tasks are relatively heavy‐weight, such as the octree construction in [CT08], and our results for BVH construction are that work stealing provides roughly equivalent or slightly faster execution time compared to work balancing. However, for our collision and distance queries each operation is far more fine‐grained and even efficient work queue methods have overhead that dominates the overall computation as shown in Fig.…”
Section: Analysis and Comparisonmentioning
confidence: 62%
See 2 more Smart Citations
“…Other approaches for work distributions on GPUs have explored lock‐free work queues and work stealing approaches [CT08]. We have found that these methods work well as long as parallel tasks are relatively heavy‐weight, such as the octree construction in [CT08], and our results for BVH construction are that work stealing provides roughly equivalent or slightly faster execution time compared to work balancing. However, for our collision and distance queries each operation is far more fine‐grained and even efficient work queue methods have overhead that dominates the overall computation as shown in Fig.…”
Section: Analysis and Comparisonmentioning
confidence: 62%
“…LGS * 09]. Other methods have explored lock-free queues and work stealing to parallelize octree construction [CT08] [ZHKM08]. The underlying planner performs distance queries repeatedly to compute such a path.…”
Section: Gpu Architecturesmentioning
confidence: 99%
See 1 more Smart Citation
“…Many application studies have evaluated the use of atomic operations [31,22,6]. Focusing on the parallel Push-Relabel algorithm, Vineet et al [31] used atomic operations to address the data consistency problem.…”
Section: Related Workmentioning
confidence: 99%
“…Blelloch [1] first introduced scan as a fundamental primitive and discussed its possible applications [2].Later on, more and more parallel applications of scan emerged, such as sort [6,7,8,9,10,11], BFS [12,13], SpMV [13,14], parallel compaction [15],minimal spanning tree [16] and linked list prefix computations [17], etc. The (inclusive) scan problem is defined as follows: Given a sequence with n input elements: ] , ,..., , [ The ⊕ symbol denotes a binary reduction operator, which satisfies the associative law and commutative law.…”
Section: Introductionmentioning
confidence: 99%