Proceedings of the 37th Annual International Symposium on Computer Architecture 2010
DOI: 10.1145/1815961.1815992
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic warp subdivision for integrated branch and memory divergence tolerance

Abstract: SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget. However, throughput is reduced when a set of threads operating in lockstep (a warp) are stalled due to long latency memory accesses. The resulting idle cycles are extremely costly. Multi-threading can hide latencies by interleaving the execution of multiple warps, but deep multi-threading using many warps dramatically increases the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
159
0
3

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 198 publications
(162 citation statements)
references
References 34 publications
0
159
0
3
Order By: Relevance
“…For instance, more flexibility could be obtained using Dynamic Warp Formation [24] or Simultaneous Branch Interweaving [25], Dynamic Warp Subdivision [9] could improve latency tolerance by allowing threads to diverge on partial cache misses, and Dynamic Scalarization [29] could further unify redundant dataflow across threads.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, more flexibility could be obtained using Dynamic Warp Formation [24] or Simultaneous Branch Interweaving [25], Dynamic Warp Subdivision [9] could improve latency tolerance by allowing threads to diverge on partial cache misses, and Dynamic Scalarization [29] could further unify redundant dataflow across threads.…”
Section: Discussionmentioning
confidence: 99%
“…The concept of IS corresponds to warp-split [9] in the GPU architecture literature. While thread-to-warp assignment is static, a threadto-IS assignment is dynamic: the number of IS per warp may vary from 1 to m during execution, as does the number of threads per IS.…”
Section: The Dynamic Inter-thread Vectorization Architecturementioning
confidence: 99%
“…Contrairement à la proposition de Takahashi, elle ne nécessite pas d'exécuter des blocs de base plus souvent que nécessaire. Meng, Tarjan et Skadron considèrent dans le cadre de leur technique de subdivision dynamique de warps une reconvergence opportuniste lorsque les PC de plusieurs threads coïncident (Meng et al, 2010). Cependant, le principal moyen de reconvergence repose toujours sur des points de synchronisation explicites et une pile de masques.…”
Section: Reconvergence Implicite Sans Pileunclassified
“…Une fois ces données arrivées, un arbitrage de PC a lieu pour donner l'opportunité à ces threads de « rattraper » les autres. Ce mécanisme permet aux threads ayant réussi leur accès au cache de prendre de l'avance sur ceux qui sont bloqués sur un échec, offrant gratuitement une fonctionnalité analogue à la subdivision dynamique de warps (Meng et al, 2010). L'architecture proposée est présentée figure 5a.…”
Section: Similarité Avec Le Chemin D'accès Mémoireunclassified
See 1 more Smart Citation