Dynamic warp subdivision for integrated branch and memory divergence tolerance

Meng, Jiayuan; Tarjan, David; Skadron, Kevin

doi:10.1145/1815961.1815992

Cited by 198 publications

(162 citation statements)

References 34 publications

Supporting

Mentioning

159

Contrasting

Unclassified

Order By: Relevance

“…For instance, more flexibility could be obtained using Dynamic Warp Formation [24] or Simultaneous Branch Interweaving [25], Dynamic Warp Subdivision [9] could improve latency tolerance by allowing threads to diverge on partial cache misses, and Dynamic Scalarization [29] could further unify redundant dataflow across threads.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

Kalathingal

Collange

Swamy

et al. 2016

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

View full text Add to dashboard Cite

Abstract-Threads of Single-Program Multiple-Data (SPMD) applications often execute the same instructions on different data. We propose the Dynamic Inter-Thread Vectorization Architecture (DITVA) to leverage this implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode. In this mode, multiple scalar threads running in lockstep share a single instruction stream and their respective instruction instances are aggregated into SIMD instructions. To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures.Our evaluation on the SPMD applications from the PARSEC and Rodinia OpenMP benchmarks shows that a 4-warp × 4-lane 4-issue DITVA architecture with a realistic bank-interleaved cache achieves 1.55× higher performance than a 4-thread 4-issue SMT architecture with AVX instructions while fetching and issuing 51% fewer instructions, achieving an overall 24% energy reduction.

show abstract

Section: Discussionmentioning

confidence: 99%

“…The concept of IS corresponds to warp-split [9] in the GPU architecture literature. While thread-to-warp assignment is static, a threadto-IS assignment is dynamic: the number of IS per warp may vary from 1 to m during execution, as does the number of threads per IS.…”

Section: The Dynamic Inter-thread Vectorization Architecturementioning

confidence: 99%

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

Kalathingal

Collange

Swamy

et al. 2016

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

View full text Add to dashboard Cite

show abstract

“…Contrairement à la proposition de Takahashi, elle ne nécessite pas d'exécuter des blocs de base plus souvent que nécessaire. Meng, Tarjan et Skadron considèrent dans le cadre de leur technique de subdivision dynamique de warps une reconvergence opportuniste lorsque les PC de plusieurs threads coïncident (Meng et al, 2010). Cependant, le principal moyen de reconvergence repose toujours sur des points de synchronisation explicites et une pile de masques.…”

Section: Reconvergence Implicite Sans Pileunclassified

“…Une fois ces données arrivées, un arbitrage de PC a lieu pour donner l'opportunité à ces threads de « rattraper » les autres. Ce mécanisme permet aux threads ayant réussi leur accès au cache de prendre de l'avance sur ceux qui sont bloqués sur un échec, offrant gratuitement une fonctionnalité analogue à la subdivision dynamique de warps (Meng et al, 2010). L'architecture proposée est présentée figure 5a.…”

Section: Similarité Avec Le Chemin D'accès Mémoireunclassified

“…Les techniques de gestion de la divergence et de la reconvergence forment la base indispensable sur laquelle peuvent s'appuyer des politiques d'ordonnancement de threads de plus haut niveau tels que la formation dynamique de warps (W. W. L. Fung et al, 2009), la subdivision dynamique de warps (Meng et al, 2010) ou l'entrelacement simultané de branches (Brunie et al, 2012). Ces techniques complémentaires pourront bénéficier des avancées dans le domaine de la gestion de la reconvergence.…”

Section: Conclusion Et Perspectivesunclassified

See 1 more Smart Citation

Reconvergence de contrôle implicite pour les architectures SIMT

Brunie¹,

Collange²

2013

Techniques et sciences informatiques

View full text Add to dashboard Cite

RÉSUMÉ. Les architectures parallèles qui obéissent au modèle SIMT telles que les GPU tirent parti de la régularité des applications en exécutant plusieurs threads concurrents sur des unités SIMD de manière synchrone. Lorsque les threads empruntent des chemins divergents dans le graphe de flot de contrôle, leur exécution est séquentialisée jusqu'au prochain point de convergence. La reconvergence doit être effectuée au plus tôt de manière à maximiser l'occupation des unités SIMD. Nous proposons dans cet article deux techniques permettant de traiter la divergence de contrôle en SIMT et d'identifier dynamiquement les points de reconvergence, dont une qui opère en espace constant et gère les sauts indirects et la récursivité. Nous évaluons une réalisation matérielle consistant à partager le matériel existant de l'unité de gestion de la divergence mémoire. En termes de performances, cette solution est au moins aussi efficace que les techniques de l'état de l'art employés par les GPU actuels.ABSTRACT. Parallel architectures following the SIMT model such as GPUs benefit from application regularity by issuing concurrent threads running in lockstep on SIMD units. As threads take different paths across the control-flow graph, lockstep execution is partially lost, and must be regained whenever possible in order to maximize the occupancy of SIMD units. In this paper, we propose two techniques to handle SIMT control divergence and identify reconvergence points. The most advanced one operates in constant space and handles indirect jumps and recursion. We evaluate a hardware implementation which leverage the existing memory divergence management unit. In terms of performance, this solution is at least as efficient as state of the art techniques in use in current GPUs.MOTS-CLÉS : Reconvergence de flot de contrôle, SIMD, SIMT, GPU

show abstract

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Chakroun

Mezmaz

Melab

et al. 2012

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYIn this paper, we address the design and implementation of graphical processing unit (GPU)-accelerated branch-and-bound algorithms (B&B) for solving flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multithreaded accelerators using the single instruction multiple data model at execution. A major issue that arises when executing on GPU, a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP that contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on wellknown FSP benchmarks using an Nvidia Tesla (C2050 GPU card (http://www.nvidia.com/docs/IO/43395/ NV_DS_Tesla_C2050_C2070_jul10_lores.pdf)). Compared with a CPU-based execution, accelerations up to 77.46 are achieved for large problem instances.

show abstract

Dynamic warp subdivision for integrated branch and memory divergence tolerance

Cited by 198 publications

References 34 publications

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

Reconvergence de contrôle implicite pour les architectures SIMT

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Contact Info

Product

Resources

About