2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2019
DOI: 10.1109/hpca.2019.00042
|View full text |Cite
|
Sign up to set email alerts
|

Architecting Waferscale Processors - A GPU Case Study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 36 publications
(9 citation statements)
references
References 55 publications
0
9
0
Order By: Relevance
“…Next, we evaluate the performance improvement that multinode packaged systems (e.g., MCM-GPU [17], waferscale-GPU [18], Tesla Dojo [19]) can provide in a distributed training setup (see Fig. 11).…”
Section: Effect Of Multi-node Packagementioning
confidence: 99%
“…Next, we evaluate the performance improvement that multinode packaged systems (e.g., MCM-GPU [17], waferscale-GPU [18], Tesla Dojo [19]) can provide in a distributed training setup (see Fig. 11).…”
Section: Effect Of Multi-node Packagementioning
confidence: 99%
“…Data is physically moved around, and therefore scheduling the communication operations is of critical importance to minimize qubit movements and consolidate interactions with a given qubit in the minimum amount of time. NoCs are generally not bound to scheduling, although efforts in real-time embedded systems or machine learning accelerators also advocate for it in the classical domain [15,22]. In any case, this aspect is at the frontier between the network and the architecture, Welcome back to circuit switching: Quantum teleportation uses both a classical channel and a quantum channel to transmit the information: the measurement output at the Tx node (2 bits), and the entangled photon qubit pair.…”
Section: Comparison With Network-on-chipmentioning
confidence: 99%
“…Graph Partitioning Based on Resource: In some of the emerging architectures, graph partitioning algorithms can be used to assign tasks to the best execution unit available. In [44], a wafer-scale architecture is proposed to minimize communication overheads and memory access latency. It aims to schedule TBs with high data sharing to adjacent processing modules.…”
Section: Related Workmentioning
confidence: 99%