André Viebke scite author profile

André Viebke

4Publications

33Citation Statements Received

101Citation Statements Given

How they've been cited

How they cite others

106

101

Affiliations

Linnaeus University

Publications

Order By: Most citations

The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning

Viebke

Pllana

2015

View full text Add to dashboard Cite

Abstract-Supervised learning of Convolutional Neural Networks (CNNs), also known as supervised Deep Learning, is a computationally demanding process. To find the most suitable parameters of a network for a given application, numerous training sessions are required. Therefore, reducing the training time per session is essential to fully utilize CNNs in practice. While numerous research groups have addressed the training of CNNs using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we investigate empirically and theoretically the potential of the Intel Xeon Phi for supervised learning of CNNs. We design and implement a parallelization scheme named CHAOS that exploits both the thread-and SIMD-parallelism of the coprocessor. Our approach is evaluated on the Intel Xeon Phi 7120P using the MNIST dataset of handwritten digits for various thread counts and CNN architectures. Results show a 103.5x speed up when training our large network for 15 epochs using 244 threads, compared to one thread on the coprocessor. Moreover, we develop a performance model and use it to assess our implementation and answer what-if questions.

show abstract

CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi

et al. 2017

View full text Add to dashboard Cite

Deep learning is an important component of Big Data analytic tools and intelligent applications, such as self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks. In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and implicit synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103× compared to the execution on one thread of the Xeon Phi, 14× compared to the sequential execution on Intel Xeon E5, and 58× compared to the sequential execution on Intel Core i5.

show abstract

Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Viebke

Pllana

Memeti

et al. 2019

View full text Add to dashboard Cite

Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Viebke¹,

Pllana²,

Memeti³

et al. 2019

Preprint

View full text Add to dashboard Cite

Many complex problems, such as natural language processing or visual object detection, are solved using deep learning. However, efficient training of complex deep convolutional neural networks for large data sets is computationally demanding and requires parallel computing resources. In this paper, we present two parameterized performance models for estimation of execution time of training convolutional neural networks on the Intel many integrated core architecture. While for the first performance model we minimally use measurement techniques for parameter value estimation, in the second model we estimate more parameters based on measurements. We evaluate the prediction accuracy of performance models in the context of training three different convolutional neural network architectures on the Intel Xeon Phi. The achieved average performance prediction accuracy is about 15% for the first model and 11% for second model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

André Viebke

The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning

CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi

Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Contact Info

Product

Resources

About