2016
DOI: 10.1002/cpe.3865
|View full text |Cite
|
Sign up to set email alerts
|

Kepler GPU accelerated recursive sorting using dynamic parallelism

Abstract: Summary This paper focuses on the performance gain obtained on the Kepler graphics processing units (GPUs) for multi‐key quicksort. Because multi‐key quicksort is a recursive‐based algorithm, many of the researchers have found it tedious to parallelize the algorithm on the multi and many core architectures. A survey of the state‐of‐the‐art string sorting algorithms and a robust insight of the Kepler GPU architecture gave rise to an intriguing research idea of matching the template of multi‐key quicksort with t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…T parallel−CUDA and T parallel−NOpenCL are defined by T p = T kernel + T ovehead + T other (5) where T kernel is the total of the execution times of the kernels on the GPU, T ovehead is the total of the data transfer overhead on the CPU and the GPU, and T other is the total of the execution times of the data structure initialization, and so on. 50 The speedup ratio reflects the overall efficiency improvement of the parallel algorithm in the corresponding architecture compared to the CPU sequential algorithm and can be used for objective evaluation of the actual system speed.…”
Section: Ta B L E 5 Radix Sort Algorithm Execution Time Under Differementioning
confidence: 99%
See 3 more Smart Citations
“…T parallel−CUDA and T parallel−NOpenCL are defined by T p = T kernel + T ovehead + T other (5) where T kernel is the total of the execution times of the kernels on the GPU, T ovehead is the total of the data transfer overhead on the CPU and the GPU, and T other is the total of the execution times of the data structure initialization, and so on. 50 The speedup ratio reflects the overall efficiency improvement of the parallel algorithm in the corresponding architecture compared to the CPU sequential algorithm and can be used for objective evaluation of the actual system speed.…”
Section: Ta B L E 5 Radix Sort Algorithm Execution Time Under Differementioning
confidence: 99%
“…4 The Compute Unified Device Architecture (CUDA) uses the parallel computing engine of the NVIDIA Graphic Processing Unit (GPU) to achieve a more efficient computing solution than the CPU for solving many complex computing tasks. 5 However, there are certain problems. For example, in terms of software porting, NVIDIA GPUs and AMD GPUs are not compatible with each other, and parallel algorithms are not portable.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This is a building block of suffix sorting, used in string matching, and database index construction [13]. Parallel string sorting algorithms have been proposed on CPUs [14] and GPUs [15], however, to the best of our knowledge, no hardware accelerator for this problem has been made available yet. Indeed, handling variable-length keys in hardware is not only challenging per se but also involves key comparisons that can become expensive as keys are arbitrarily long.…”
Section: Introductionmentioning
confidence: 99%