2008
DOI: 10.1007/978-3-540-87744-8_21
|View full text |Cite
|
Sign up to set email alerts
|

A Practical Quicksort Algorithm for Graphics Processors

Abstract: Abstract. In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered as an inefficient sorting solution for graphics processors, but we show that GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
121
0
1

Year Published

2010
2010
2014
2014

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 98 publications
(127 citation statements)
references
References 20 publications
1
121
0
1
Order By: Relevance
“…However, it is far from being accurate: for example it does not consider the cost of nonlocal memory references that, instead, has a great impact on performance. The PRAM model has been applied to some recently proposed GPU algorithms [4]. Indeed, as we describe in this paper, experimental results contradict theoretical results that are not done using a computational model specifically designed for GPUs.…”
Section: Introductionmentioning
confidence: 69%
“…However, it is far from being accurate: for example it does not consider the cost of nonlocal memory references that, instead, has a great impact on performance. The PRAM model has been applied to some recently proposed GPU algorithms [4]. Indeed, as we describe in this paper, experimental results contradict theoretical results that are not done using a computational model specifically designed for GPUs.…”
Section: Introductionmentioning
confidence: 69%
“…This means, however, that there will be a very little parallelization in the beginning, when the sequence is few [6]. To deal with this issue, Caderman [7] introduces a parallel version of quick sort combining CPU process and GPU process. The algorithm is theoretically optimal, but the data transferring between the CPU and GPU slows down the running time in practice.…”
Section: Related Workmentioning
confidence: 99%
“…An overview of sorting algorithms in parallel is given in [8]. A quick-sort implementation on GPU using CUDA is considered in [9] which results quick-sort as an efficient alternative to both bitonic and radix sort over GPU's for larger data sequences. Moreover bitonic sort is suggested for smaller sequences.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover bitonic sort is suggested for smaller sequences. The quick-sort algorithm discussed in [9] uses a divide-and-conquer approach for sorting, forming left and right sequences depending on whether current value is greater or smaller than pivot value. For each recursive call, a new pivot value has to be selected.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation