2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) 2017
DOI: 10.1109/ccgrid.2017.110
|View full text |Cite
|
Sign up to set email alerts
|

Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

Abstract: Deep neural networks have gained popularity in recent years, obtaining outstanding results in a wide range of applications such as computer vision in both academia and multiple industry areas. The progress made in recent years cannot be understood without taking into account the technological advancements seen in key domains such as High Performance Computing, more specifically in the Graphic Processing Unit (GPU) domain. These kind of deep neural networks need massive amounts of data to effectively train the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…Upon the execution, workers remove images (one at a time) from the shared queue until it is exhausted. This mechanism ensures multiple GPU runtimes evenly divide the workloads among GPUs and achieve quasi‐linear acceleration at the application level, where the perfect linear speed‐up is unattainable because of model loading and memory transfer overhead 63 …”
Section: Discussionmentioning
confidence: 99%
“…Upon the execution, workers remove images (one at a time) from the shared queue until it is exhausted. This mechanism ensures multiple GPU runtimes evenly divide the workloads among GPUs and achieve quasi‐linear acceleration at the application level, where the perfect linear speed‐up is unattainable because of model loading and memory transfer overhead 63 …”
Section: Discussionmentioning
confidence: 99%
“…A learning rate value proportional to the batch size, warmup learning rate behaviour, batch normalization, SGD to RMSProp optimizer transition are some of the techniques exposed in these works. A study of the distributed training methods using ResNet-50 architecture on a HPC cluster is shown in [10,11]. To know more about the algorithms used in this field we refer to [8].…”
Section: Related Workmentioning
confidence: 99%
“…For this reason, experimenting with several workers is crucial to minimize the amount of time spent on this tasks. We test the same model and training procedure with two of the main used frameworks to train Deep Learning models, PyTorch and Tensorflow 10 . In both cases we use their own APIs for making a synchronous distributed training in several GPUs by means of data parallelism, where training on each GPU is done in its own process.…”
Section: Parallel Platformsmentioning
confidence: 99%