Large-scale distributed L-BFGS

Najafabadi, Maryam M.; Khoshgoftaar, Taghi M.; Villanustre, Flavio; Holt, J. Darrin

doi:10.1186/s40537-017-0084-5

Cited by 32 publications

(19 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HPCC systems is being used in a wide range of applications including parameter estimation for improving machine learning models [38] and cyber security analytics [39][40][41]. The healthcare applications utilizing HPCC platforms show great potential of HPCC in this domain as well, as it covers a wide range of applications detecting organized crime in healthcare using social network analytics [42].…”

Section: Hpcc Relatedmentioning

confidence: 99%

HPCC based framework for COPD readmission risk analysis

et al. 2019

View full text Add to dashboard Cite

Prevention of hospital readmissions has the potential of providing better quality of care to the patients and deliver significant cost savings. A review of existing readmission analysis frameworks based on data type, data size, disease conditions, algorithms and other features shows that existing frameworks do not address the issue of using large amounts of data that is fundamental to readmission prediction analysis. Available patient data for readmission risk analysis has high dimensionality and number of instances. Further, there is more new data produced everyday which can be used on a continuous basis to improve the prediction power of risk models. This study proposes a High Performance Computing Cluster based Big Data readmission risk analysis framework which uses Nave Bayes classification algorithm. The study shows that the overall evaluation time using Big Data and a parallel computing platform can be significantly decreased, while maintaining model performance.

show abstract

Section: Hpcc Relatedmentioning

confidence: 99%

HPCC based framework for COPD readmission risk analysis

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Parallel quasi-Newton methods have been explored in several directions: map-reduce (vector-free L-BFGS) [29] has been used to parallelize the two-loop recursion (See more discussion in Section 2) in a deterministic way; the distributed L-BFGS [35] is focused on the implementation of L-BFGS over high performance computing cluster (HPCC) platform, e.g. how to distribute data such that a full gradient or the two-loop recur- Table 1: The comparison of various quasi-Newton methods in terms of their stochastic, parallel, asynchronous frameworks, convergence rates, and if they use a variance reduction technique and limited memory update of BFGS method to make it work well for high dimensional problem.…”

Section: Introductionmentioning

confidence: 99%

Asynchronous parallel stochastic Quasi-Newton methods

Tong

Liang

Cai³

et al. 2021

Parallel Computing

View full text Add to dashboard Cite

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method. Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee. Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. We prove that our asynchronous parallel scheme maintains the same linear convergence rate but achieves significant speedup. Empirical evaluations in both simulations and benchmark datasets demonstrate the speedup in comparison with the non-parallel stochastic L-BFGS, as well as the better performance than first-order methods in solving ill-conditioned problems.

show abstract

“…Previously, HPCC systems had only more traditional machine Learning algorithms, some Deep Learning algorithms that worked on a single node and a single algorithm that worked in a distributed fashion. Najafabadi et al [4] proposed a distributed L-BFGS algorithm on HPCC systems but their approach was limited to the capabilities of HPCC systems. Since, Deep Learning is well suited for Big Data analytics [5] and HPCC excels at Big Data processing [6], our approach leverages the capabilities of both HPCC systems and different third-party Python libraries to train single Deep Learning networks using multiple nodes in parallel.…”

mentioning

confidence: 99%

“…Commodity computing is defined as a cluster computing system comprised of individual, relatively cheap and easy to obtain computers connected with standard networking protocols 4. For the purpose of this paper, the configured system has one Thor process per physical node and the term node is used interchangeably with the term process and worker.…”

mentioning

confidence: 99%

A parallel and distributed stochastic gradient descent implementation using commodity clusters

Kennedy

Khoshgoftaar

Villanustre³

et al. 2019

J Big Data

Self Cite

View full text Add to dashboard Cite

IntroductionTraining neural networks effectively and efficiently is an important component of Deep Learning. Large neural networks can consist of dozens, hundreds or even thousands of layers each with thousands of artificial neurons. Depending on the network's architecture, each of these neurons is connected to a large number of other neurons, where each connection has a trainable weight parameter that determines how the network responds to input signals. In the context of this paper, the effective training of these large complex networks is accomplished through the use of the computationally expensive process of backpropagation. Additionally, neural networks benefit from training on Big Data, as typically more data produces more performant models [1]. For example, the ImageNet database AlexNet was trained on roughly 1.2 million images, and at the time achieved state of the art results [2]. Problems of this magnitude are common and thus researching parallel network optimization on distributed and parallel systems is highly important. AbstractDeep Learning is an increasingly important subdomain of artificial intelligence, which benefits from training on Big Data. The size and complexity of the model combined with the size of the training dataset makes the training process very computationally and temporally expensive. Accelerating the training process of Deep Learning using cluster computers faces many challenges ranging from distributed optimizers to the large communication overhead specific to systems with off the shelf networking components. In this paper, we present a novel distributed and parallel implementation of stochastic gradient descent (SGD) on a distributed cluster of commodity computers. We use high-performance computing cluster (HPCC) systems as the underlying cluster environment for the implementation. We overview how the HPCC systems platform provides the environment for distributed and parallel Deep Learning, how it provides a facility to work with third party open source libraries such as TensorFlow, and detail our use of third-party libraries and HPCC functionality for implementation. We provide experimental results that validate our work and show that our implementation can scale with respect to both dataset size and the number of compute nodes in the cluster.

show abstract

Large-scale distributed L-BFGS

Cited by 32 publications

References 18 publications

HPCC based framework for COPD readmission risk analysis

HPCC based framework for COPD readmission risk analysis

Asynchronous parallel stochastic Quasi-Newton methods

A parallel and distributed stochastic gradient descent implementation using commodity clusters

Contact Info

Product

Resources

About