Proceedings of the 22nd Annual International Conference on Supercomputing 2008
DOI: 10.1145/1375527.1375559
|View full text |Cite
|
Sign up to set email alerts
|

Fast scan algorithms on graphics processors

Abstract: Scan and segmented scan are important data-parallel primitives for a wide range of applications. We present fast, work-efficient algorithms for these primitives on graphics processing units (GPUs). We use novel data representations that map well to the GPU architecture. Our algorithms exploit shared memory to improve memory performance. We further improve the performance of our algorithms by eliminating shared-memory bank conflicts and reducing the overheads in prior shared-memory GPU algorithms. Furthermore, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 108 publications
(76 citation statements)
references
References 9 publications
0
76
0
Order By: Relevance
“…-There are a number of implementations of scan available in kernel languages [6,9,14,30]. These implementations encapsulate the barriers, fences, and multiple kernel invocations required in a correct and efficient implementation of scan and provide an interface through which the using programmer provides the ⊕ function.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…-There are a number of implementations of scan available in kernel languages [6,9,14,30]. These implementations encapsulate the barriers, fences, and multiple kernel invocations required in a correct and efficient implementation of scan and provide an interface through which the using programmer provides the ⊕ function.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…The height W is the workload (i.e., the number of nonzero entries to be processed) of a thread. A tile is a basic work unit in matrix-based segmented sum method [20,35], which is used as a building block in our SpMV algorithm. Actually, the term "tile" is equivalent to the term "matrix" used in original description of the segmented scan algorithms [20,35].…”
Section: Data Decompositionmentioning
confidence: 99%
“…A tile is a basic work unit in matrix-based segmented sum method [20,35], which is used as a building block in our SpMV algorithm. Actually, the term "tile" is equivalent to the term "matrix" used in original description of the segmented scan algorithms [20,35]. Here we use "tile" to avoid confusion between a work unit of matrix shape and a sparse matrix in SpMV.…”
Section: Data Decompositionmentioning
confidence: 99%
See 1 more Smart Citation
“…This definition is also known as exclusive scan or prescan. Dotsenko et al [15] proposed a matrix-based algorithm for prefix scan on GPUs. Input data are partitioned into matrices with α rows and b columns and each matrix is processed by a multiprocessor.…”
Section: Prefix Scan Algorithmsmentioning
confidence: 99%