Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

Campos, Víctor; Sastre, Francesc; Yagües, Maurici; Torres, Jordi; Giró-i-Nieto, Xavier

doi:10.1109/ccgrid.2017.110

“…Upon the execution, workers remove images (one at a time) from the shared queue until it is exhausted. This mechanism ensures multiple GPU runtimes evenly divide the workloads among GPUs and achieve quasi‐linear acceleration at the application level, where the perfect linear speed‐up is unattainable because of model loading and memory transfer overhead 63 …”

Section: Discussionmentioning

confidence: 99%

Edge‐adaptable serverless acceleration for machine learning Internet of Things applications

Zhang

¹

,

Krintz

²

,

Wolski

³

2020

View full text Add to dashboard Cite

Serverless computing is an emerging event‐driven programming model that accelerates the development and deployment of scalable web services on cloud computing systems. Though widely integrated with the public cloud, serverless computing use is nascent for edge‐based, Internet of Things (IoT) deployments. In this work, we present STOIC (serverless teleoperable hybrid cloud), an IoT application deployment and offloading system that extends the serverless model in three ways. First, STOIC adopts a dynamic feedback control mechanism to precisely predict latency and dispatch workloads uniformly across edge and cloud systems using a distributed serverless framework. Second, STOIC leverages hardware acceleration (e.g., GPU resources) for serverless function execution when available from the underlying cloud system. Third, STOIC can be configured in multiple ways to overcome deployment variability associated with public cloud use. We overview the design and implementation of STOIC and empirically evaluate it using real‐world machine learning applications and multitier IoT deployments (edge and cloud). Specifically, we show that STOIC can be used for training image processing workloads (for object recognition)—once thought too resource‐intensive for edge deployments. We find that STOIC reduces overall execution time (response latency) and achieves placement accuracy that ranges from 92% to 97%.

show abstract

“…A learning rate value proportional to the batch size, warmup learning rate behaviour, batch normalization, SGD to RMSProp optimizer transition are some of the techniques exposed in these works. A study of the distributed training methods using ResNet-50 architecture on a HPC cluster is shown in [10,11]. To know more about the algorithms used in this field we refer to [8].…”

Section: Related Workmentioning

confidence: 99%

Improving Accuracy and Speeding Up Document Image Classification Through Parallel Systems

Ferrando

¹

,

Domínguez

²

,

Torres

³

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

This paper presents a study showing the benefits of the Effi-cientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.

show abstract

“…For this reason, experimenting with several workers is crucial to minimize the amount of time spent on this tasks. We test the same model and training procedure with two of the main used frameworks to train Deep Learning models, PyTorch and Tensorflow 10 . In both cases we use their own APIs for making a synchronous distributed training in several GPUs by means of data parallelism, where training on each GPU is done in its own process.…”

Section: Parallel Platformsmentioning

confidence: 99%

Improving accuracy and speeding up Document Image Classification through parallel systems

Ferrando,

Dominguez,

Torres

et al. 2020

Preprint

Self Cite

0

View full text Add to dashboard Cite

This paper presents a study showing the benefits of the Effi-cientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.

show abstract

Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

Cited by 12 publications

References 11 publications

Edge‐adaptable serverless acceleration for machine learning Internet of Things applications

Edge‐adaptable serverless acceleration for machine learning Internet of Things applications

Improving Accuracy and Speeding Up Document Image Classification Through Parallel Systems

Improving accuracy and speeding up Document Image Classification through parallel systems

Contact Info

Product

Resources

About