Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering 2015
DOI: 10.1145/2668930.2688048
|View full text |Cite
|
Sign up to set email alerts
|

Design and Evaluation of Scalable Concurrent Queues for Many-Core Architectures

Abstract: As core counts increase and as heterogeneity becomes more common in parallel computing, we face the prospect of programming hundreds or even thousands of concurrent threads in a single shared-memory system. At these scales, even highly-efficient concurrent algorithms and data structures can become bottlenecks, unless they are designed from the ground up with throughput as their primary goal.In this paper, we present three contributions: (1) a characterization of queue designs in terms of modern multi-and many-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…The major reason is that the Maxwell architecture dramatically improves its micro-architectures for faster atomic operations, which are extensively utilized in our approach. Actually, Scogland and Feng [25] also confirmed that atomic operations have been continuously improved in the last generations of modern GPUs. Moreover, although the AMD Fury X GPU has higher bandwidth than the NVIDIA Titan X, it is in general slower for our synchronizationfree SpTRSV algorithm.…”
Section: Sptrsv Performancementioning
confidence: 84%
“…The major reason is that the Maxwell architecture dramatically improves its micro-architectures for faster atomic operations, which are extensively utilized in our approach. Actually, Scogland and Feng [25] also confirmed that atomic operations have been continuously improved in the last generations of modern GPUs. Moreover, although the AMD Fury X GPU has higher bandwidth than the NVIDIA Titan X, it is in general slower for our synchronizationfree SpTRSV algorithm.…”
Section: Sptrsv Performancementioning
confidence: 84%
“…The major reason is that the Pascal architecture is equipped with higher bandwidth and improved micro-architectures for atomic operations, which are extensively utilized in our approach. Actually, Scogland and Feng [39] also confirmed that atomic operations have been continuously improved in the latest generations of modern GPUs. Moreover, although the AMD Fury X GPU has slightly higher bandwidth than the NVIDIA Titan X, it is in general slower for our synchronization-free SpTRSV algorithm.…”
Section: Sptrsv Performancementioning
confidence: 85%
“…An extensive body of work has embarked on the redesign of data structures for construction and general computation on the GPU [88]. Within the context of searching, these acceleration structures include sorted arrays [3], [4], [8], [51], [66], [67], [98] and linked lists [116], hash tables (see section III), spatial-partitioning trees (e.g., k-d trees [57], [115], [120], octrees [57], [119], bounding volume hierarchies (BVH) [57], [64], R-trees [71], and binary indexing trees [59], [99]), spatial-partitioning grids (e.g., uniform [36], [53], [62] and two-level [52]), skiplists [81], and queues (e.g., binary heap priority [43] and FIFO [17], [101]). Due to significant architectural differences between the CPU and GPU, search structures cannot simply be "ported" from the CPU to the GPU and maintain optimal performance.…”
Section: Gpu Searchingmentioning
confidence: 99%