2012
DOI: 10.1007/978-3-642-32820-6_87
|View full text |Cite
|
Sign up to set email alerts
|

Understanding the Performance of Concurrent Data Structures on Graphics Processors

Abstract: Abstract. In this paper we revisit the design of concurrent data structures -specifically queues -and examine their performance portability with regard to the move from conventional CPUs to graphics processors. We have looked at both lock-based and lock-free algorithms and have, for comparison, implemented and optimized the same algorithms on both graphics processors and multi-core CPUs. Particular interest has been paid to study the difference between the old Tesla and the new Fermi and Kepler architectures i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…The single thread based operations tend to incur more contentions in CAS operations. As reported by Cederman et al [5], the GPU based queue operations are slower than their multi-core equivalents.…”
Section: Introductionmentioning
confidence: 79%
See 2 more Smart Citations
“…The single thread based operations tend to incur more contentions in CAS operations. As reported by Cederman et al [5], the GPU based queue operations are slower than their multi-core equivalents.…”
Section: Introductionmentioning
confidence: 79%
“…Many GPU based libraries supporting various programming primitives have been proposed (e,g, [3], [4]). On the other hand, the FIFO queue, which is one of the most fundamental data structures and has wide applications, has only attracted limited research efforts (e.g., [5], [6]). In this paper, we propose an efficient concurrent lock-free queue for GPGPU.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, several of these queues have been evaluated on CUDA GPUs by Cederman et al [3]. Out of a number of lock-based and two lock-free designs (i.e., MS-queue and TZqueue), they conclude that for higher concurrency, the two lock-free queue designs are nearly always highest performing.…”
Section: Related Workmentioning
confidence: 99%
“…By using a single thread to perform all operations on the queue at any given time, the maximum throughput at any level of contention is the same. At all numbers of threads we model the combining queue using the atomic latency for one thread in terms of Equation 3. This form is effectively the same as one would use for a serial queue, except that an additional read is performed to determine the operation to perform.…”
Section: Queue Throughput Modelingmentioning
confidence: 99%