2009
DOI: 10.1145/1498698.1564500
|View full text |Cite
|
Sign up to set email alerts
|

GPU-Quicksort

Abstract: In this article, we describe GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multicore graphics processors. Quicksort has previously been considered an inefficient sorting solution for graphics processors, but we show that in CUDA, NVIDIA's programing platform for general-purpose computations on graphical processors, GPU-Quicksort performs better than the fastest-known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 55 publications
(9 citation statements)
references
References 15 publications
0
9
0
Order By: Relevance
“…The source code of GPU-Quicksort is available for non-commercial use [21]. For benchmarking we used the following distributions which are defined and motivated in [22].…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…The source code of GPU-Quicksort is available for non-commercial use [21]. For benchmarking we used the following distributions which are defined and motivated in [22].…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…For this reason, most implementations of GPU QuickSort like Cederman and Tsigas (2010) and Banerjee et al (2014) use a hybrid implementation where the CPU performs several scatter-gather calls to the GPU to perform the sort of a data array.…”
Section: Discussionmentioning
confidence: 99%
“…We experimented with several bucket sizes and number of samples in order to best fit them to the GPU memory structure. For sorting the selected sample and the bottom level sorts of the individual buckets, we experimented with several existing GPU sorting methods such as bitonic sort, adaptive bitonic sort [58] based on [27], and parallel quick sort [30].…”
Section: Gpu Bucket Sort: Deterministic Sample Sort For Gpusmentioning
confidence: 99%
“…The thread block then sorts a sublist of n m data items in the SM's local shared memory. We tested different implementations for the local shared memory sort within an SM, including quick sort [30], bitonic sort, and adaptive bitonic sort [27]. In our experiments, bitonic sort was consistently the fastest method, despite the fact that it requires O(n log 2 n) work.…”
Section: Algorithmmentioning
confidence: 99%
See 1 more Smart Citation