2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) 2021
DOI: 10.1109/isca52012.2021.00010
|View full text |Cite
|
Sign up to set email alerts
|

Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
81
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 187 publications
(104 citation statements)
references
References 20 publications
1
81
0
Order By: Relevance
“…In the case of Layerweaver, while most of the BERT-large (NLP) requests satisfy the QoS constraints, over 90% of the MobileNetV2 (vision) requests violate them. Due to the growing importance of support for multi-tenancy on NPUs, the lack of QoS is a serious drawback in datacenters [3].…”
Section: Limitations Of the Prior Artmentioning
confidence: 99%
See 1 more Smart Citation
“…In the case of Layerweaver, while most of the BERT-large (NLP) requests satisfy the QoS constraints, over 90% of the MobileNetV2 (vision) requests violate them. Due to the growing importance of support for multi-tenancy on NPUs, the lack of QoS is a serious drawback in datacenters [3].…”
Section: Limitations Of the Prior Artmentioning
confidence: 99%
“…For example, Google TPUv3 [2], which targets both DNN training and inference, features 128 TOP/s of computation and 900 GB/s off-chip memory bandwidth. In contrast, TPUv4i [3] targets DNN inference only, and its compute-to-memory bandwidth ratio is substantially higher On the other hand, DNN models in service have very different arithmetic intensities depending on their layer structures, operators, etc. Thus, there is no one-size-fitsall accelerator that works well for all of those DNN models.…”
Section: Introductionmentioning
confidence: 99%
“…As opposed to SysAr, TPU v3 [29] changes VU structure to accelerate less arithmetically intensive operations such as inverse-square-root of BN while training, albeit not elaborating on details about processing DW-CONV in the modiied VU. In addition, as TPU v4 [28] reuses hardware designs of TPU v3 except for several components such as on-chip memory capacity, on-chip interconnect, and DMA, the VU of TPU v4 is the same structure as that of TPU v3. There have been processing-near-DRAM studies [10,14,31] to provide high of-chip memory bandwidth during inference.…”
Section: Related Workmentioning
confidence: 99%
“…A spatial compute array is the key component in many popular low-cost CNN accelerators [50,58,97,[113][114][115][116][117][118][119][120][121][122][123].…”
Section: Spatial Architectures For Cnn Inferencementioning
confidence: 99%
“…By orchestrating data into and out of the PE network, spatial architectures can efficiently implement either matrix multiplications or convolutions. Examples of spatial architectures include Eyeriss V1/V2 [50,113], Google's TPU [97,117], NVIDIA's CUDA Tensor Cores [124], Nanofabrics [125], TRIPS [126], RAW [127], SmartMemories [128], FlexFlow [114][115][116], SCNN [129], and Morph [130]. Figure 2 illustrates the core elements of a common spatial architecture for CNN inference.…”
Section: Spatial Architectures For Cnn Inferencementioning
confidence: 99%