2018
DOI: 10.1109/tpds.2018.2789903
|View full text |Cite
|
Sign up to set email alerts
|

A Framework for the Automatic Vectorization of Parallel Sort on x86-Based Processors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 30 publications
0
15
0
Order By: Relevance
“…83 Use of intrinsics is error-prone and increases code-development time. 88 Also, it requires in-depth understanding of both the algorithm and SIMD intrinsics. Further, since arbitrary data-movement is not feasible, multiple data-reordering functions may need to be used to arrange the data in desired order for computations.…”
Section: Need Of Code-rewritingmentioning
confidence: 99%
See 1 more Smart Citation
“…83 Use of intrinsics is error-prone and increases code-development time. 88 Also, it requires in-depth understanding of both the algorithm and SIMD intrinsics. Further, since arbitrary data-movement is not feasible, multiple data-reordering functions may need to be used to arrange the data in desired order for computations.…”
Section: Need Of Code-rewritingmentioning
confidence: 99%
“…Further, since arbitrary data-movement is not feasible, multiple data-reordering functions may need to be used to arrange the data in desired order for computations. 88 Also, since the ISA and vector width supported by different processors is different, use of intrinsics may lead to non-portable code. 88 Some works propose writing individual versions of all functions for both Phi and CPU.…”
Section: Need Of Code-rewritingmentioning
confidence: 99%
“…For improving the efficiency of the TCP process, Zhang et al used faster mutation testing to both optimize test case sequence and reduce the number of test cases. With recent improvements of GPU computation techniques, parallel acceleration has been used in TCP . In the application extension of TCP techniques, Wang et al used three objectives, total time, resource usage, and density, to guide the evolution direction in the testing of videoconferencing systems.…”
Section: Related Workmentioning
confidence: 99%
“…e performance gain is mainly from the irregularity of the row distribution of D C. e sorting kernel has received much aention due to the pervasive need to order data in a plethora of applications. It has been parallelized and optimized on x86-based architectures [9,22] and GPUs [26,33,38,38,40]. Several optimized sort implementations have been included in vendor supplied libraries, e.g., cuDPP [19], rust [20], ModernGPU [3], and CUB [37].…”
Section: Sux Array Constructionmentioning
confidence: 99%