2011
DOI: 10.1145/1961295.1950408
|View full text |Cite
|
Sign up to set email alerts
|

On-the-fly elimination of dynamic irregularities for GPU computing

Abstract: The power-efficient massively parallel Graphics Processing Units (GPUs) have become increasingly influential for general-purpose computing over the past few years. However, their efficiency is sensitive to dynamic irregular memory references and control flows in an application. Experiments have shown great performance gains when these irregularities are removed. But it remains an open question how to achieve those gains through software approaches on modern GPUs. This paper presents a systematic expl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(29 citation statements)
references
References 18 publications
0
29
0
Order By: Relevance
“…Finally, Zhang et al optimize the layout of irregularly accessed data to achieve more efficient GPU memory accesses by having the CPU place elements of an irregularly accessed array that will be accessed by GPU threads at the same time next to each other, resulting in a higher degree of coalesced accesses [20].…”
Section: Related Workmentioning
confidence: 99%
“…Finally, Zhang et al optimize the layout of irregularly accessed data to achieve more efficient GPU memory accesses by having the CPU place elements of an irregularly accessed array that will be accessed by GPU threads at the same time next to each other, resulting in a higher degree of coalesced accesses [20].…”
Section: Related Workmentioning
confidence: 99%
“…Another performance impacting feature of the Racah program is the degree of warp divergence [34], which can be attributed to the calculation of factorials by means of a loop. To eliminate this overhead, the factorials are pre-computed and stored in texture memory.…”
Section: Number Of Coefficients Kernel + Transfer Time (μS)mentioning
confidence: 99%
“…The use of the if-statement and its potential to cause warp divergence is often discussed in the literature [34]. Ironically, the same cannot be said for the logical Boolean operators && and || despite the fact that they too can cause such phenomenon to occur.…”
Section: Boolean Expressionsmentioning
confidence: 99%
“…Zhang et al reduce divergent branches by remapping data locations [105]. Han and Adelrahman propose compiler solutions [46].…”
Section: Gpu Architecture 21mentioning
confidence: 99%