2019
DOI: 10.1007/978-3-030-16621-2_61
|View full text |Cite
|
Sign up to set email alerts
|

Batch Size Influence on Performance of Graphic and Tensor Processing Units During Training and Inference Phases

Abstract: The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(12 citation statements)
references
References 22 publications
(33 reference statements)
0
9
0
Order By: Relevance
“…The computation of Eq. ( 7) therefore operates on different sizes of tensors, which is sub-optimal for efficient batching in GPU and TPU [20]. For batching purpose, we perform the attention over all codewords in each codebook, fixing the "context size" of the attention to 𝑊 :…”
Section: Linear-time Self Attentionmentioning
confidence: 99%
“…The computation of Eq. ( 7) therefore operates on different sizes of tensors, which is sub-optimal for efficient batching in GPU and TPU [20]. For batching purpose, we perform the attention over all codewords in each codebook, fixing the "context size" of the attention to 𝑊 :…”
Section: Linear-time Self Attentionmentioning
confidence: 99%
“…It offers cloud services specialized for ML (Amazon EC2 P3 instances) and is equipped with an NVIDIA Tesla V100 Graphics Processing Unit (GPU). GPU computational capacity values for ML can be found in [18], where we select an average value of 6000 training samples/sec. Finally, the computational tasks for the cloud server include training (in the CML case) and model parameter aggregation (FML, EML cases).…”
Section: Training Process and Entities Computational Characteristicsmentioning
confidence: 99%
“…For the former we select an average value of 40,000 training samples/sec, assuming a Data Center is equipped with a Tensor Processing Unit (TPU). For the latter (model parameter aggregation), no reference values can be found in the literature, thus we rely on an empirical approach; we measure the average capacity for training and aggregation tasks in our personal computer (PC) setup (i.e., 6250 training samples/sec and 1.56 model aggregations/sec respectively) and compare against the training capacity reference value of 40,000 training samples/sec that was selected, according to [18]. Assuming a linear relation, the average cloud aggregation capacity is calculated as 10 model aggregations/sec.…”
Section: Training Process and Entities Computational Characteristicsmentioning
confidence: 99%
“…It offers cloud services specialized for ML and is equipped with an NVIDIA Tesla V100 Graphics Processing Unit (GPU). Thus, an edge node's computational capacity equals that of a GPU's computational capacity, whose values for ML tasks can be found in [19], from where we select an average value of =6000 training samples/sec.…”
Section: Ue and Servers' Computational Capacitymentioning
confidence: 99%