2010
DOI: 10.1007/978-3-642-14122-5_8
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations

Abstract: Abstract. We present a hardware mechanism which dynamically detects uniform and affine vectors used in Graphics Processing Units, to minimize pressure on the register file and reduce power consumption with minimal architectural modifications. A preliminary experimental analysis conducted with a simulator shows that this optimization can benefit up to 34 % of register file reads and 22 % of the computations of GPGPU applications.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
27
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 28 publications
(30 citation statements)
references
References 8 publications
(6 reference statements)
3
27
0
Order By: Relevance
“…On the contrary, close interthread locality would be harmful in the context of multi-core platforms with coherent private caches, by causing false sharing of cache lines. Collange has observed a substantially different behavior in GPGPU applications [18]. In that case, inter-thread proximity is much more common, as this type of locality contributes notoriously to performance improvements.…”
Section: Memory Access Patternsmentioning
confidence: 99%
“…On the contrary, close interthread locality would be harmful in the context of multi-core platforms with coherent private caches, by causing false sharing of cache lines. Collange has observed a substantially different behavior in GPGPU applications [18]. In that case, inter-thread proximity is much more common, as this type of locality contributes notoriously to performance improvements.…”
Section: Memory Access Patternsmentioning
confidence: 99%
“…For instance, more flexibility could be obtained using Dynamic Warp Formation [24] or Simultaneous Branch Interweaving [25], Dynamic Warp Subdivision [9] could improve latency tolerance by allowing threads to diverge on partial cache misses, and Dynamic Scalarization [29] could further unify redundant dataflow across threads.…”
Section: Discussionmentioning
confidence: 99%
“…MMT and Execution Drafting primarily target data-flow redundancy. DITVA targets controlflow redundancy, although it could be extended to exploit data-flow redundancy through dynamic scalarization techniques proposed for SIMT [29]. Both MMT and Execution Drafting seek to run all threads together in lockstep as much as possible.…”
Section: E Power and Energymentioning
confidence: 99%
“…The first is the instructions that are not dependent on thread id. These operations are the scalar operations in nature and are also referred to as uniform vector operations [6]. The second is a special case of control divergence, where the SIMT kernel has the following 'if' statement: 'if (threadIdx.x == K) {…}' where K is a constant.…”
Section: Collaborative Execution Paradigm Iii: Scalar Workload mentioning
confidence: 99%