2019
DOI: 10.1109/tc.2019.2906869
|View full text |Cite
|
Sign up to set email alerts
|

CD-Xbar: A Converge-Diverge Crossbar Network for High-Performance GPUs

Abstract: Modern GPUs feature an increasing number of streaming multiprocessors (SMs) to boost system throughput. How to construct an efficient and scalable network-on-chip (NoC) for future high-performance GPUs is particularly critical. Although a mesh network is a widely used NoC topology in manycore CPUs for scalability and simplicity reasons, it is ill-suited to GPUs because of the many-to-few-to-many traffic pattern observed in GPU-compute workloads. Although a crossbar NoC is a natural fit, it does not scale to la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 60 publications
(83 reference statements)
0
2
0
Order By: Relevance
“…The SM-and MC-routers have a 4-stage pipeline. We adopt two-level round-robin (2L-RR) CTA scheduling to balance the workload across the different SM-routers, i.e., 2L-RR first distributes CTAs across SM-routers and then across SMs within an SM-router [74], [75]. Other CTA scheduling policies are explored in the sensitivity analysis.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The SM-and MC-routers have a 4-stage pipeline. We adopt two-level round-robin (2L-RR) CTA scheduling to balance the workload across the different SM-routers, i.e., 2L-RR first distributes CTAs across SM-routers and then across SMs within an SM-router [74], [75]. Other CTA scheduling policies are explored in the sensitivity analysis.…”
Section: Methodsmentioning
confidence: 99%
“…More specifically, the hardware cost of the full crossbar increases quadratically with the number of ports. Prior work [74], [75] observed that a hierarchical crossbar (H-Xbar) scales better than a full crossbar in terms of hardware complexity and power efficiency while providing the same bisection bandwidth. The H-Xbar achieves this through a twolevel router structure, i.e., the SM-router and the MC-router in Figure 1, in which each router is shared by clusters of SMs and LLC slices, respectively.…”
Section: Background and Motivationmentioning
confidence: 99%