Scaling Distributed Machine Learning with the Parameter Server

Li, Mu

doi:10.1145/2640087.2644155

Cited by 1,162 publications

(1,179 citation statements)

References 28 publications

Supporting

Mentioning

1,146

Contrasting

Unclassified

Order By: Relevance

“…To effectively orchestrate multiple machines for a training task, the system must provide a way to manage the globally shared model parameters. The parameter server architecture, i.e., a cluster of machines to manage parameters, is widely-used to reduce I/O latency for handling parameter updates [35,36]. As shown in Figure 1, parameter servers maintain latest parameter values and serve all workers.…”

Section: Distributed Trainingmentioning

confidence: 99%

Distributed Training Large-Scale Deep Architectures

Zou

Chen

et al. 2017

Advanced Data Mining and Applications

View full text Add to dashboard Cite

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training.

show abstract

Section: Distributed Trainingmentioning

confidence: 99%

Distributed Training Large-Scale Deep Architectures

Zou

Chen

et al. 2017

Advanced Data Mining and Applications

View full text Add to dashboard Cite

show abstract

“…However, Hadoop provides superior performance than its in-memory counterparts discussed in [8] and [9]. There are other systems that are powerful and versatile with low level programming interfaces [10], [11]. The problem with them is that they are specific and cannot provide general high level programming interface, scheduling and other needed mechanisms.…”

Section: Related Workmentioning

confidence: 99%

A Novel Approach for Processing Big Data

D¹

2016

IJDMS

View full text Add to dashboard Cite

show abstract

“…As its name implies, machine learning is able to "learn" the highly complicated relationships between the independent and dependent variables via non-linear "black box" data processing. During the past decades, it has been widely used in many scientific and industrial areas, such as biology [7][8][9], medicine [10][11][12], energy [13][14][15][16][17][18][19], environment [20][21][22], engineering [23][24][25], and information technology (IT) [26,27]. These application studies indicate that machine learning techniques have dramatically boosted the development of many different areas.…”

Section: Introductionmentioning

confidence: 99%

Application of Artificial Neural Networks for Catalysis: A Review

2017

View full text Add to dashboard Cite

Abstract:Machine learning has proven to be a powerful technique during the past decades. Artificial neural network (ANN), as one of the most popular machine learning algorithms, has been widely applied to various areas. However, their applications for catalysis were not well-studied until recent decades. In this review, we aim to summarize the applications of ANNs for catalysis research reported in the literature. We show how this powerful technique helps people address the highly complicated problems and accelerate the progress of the catalysis community. From the perspectives of both experiment and theory, this review shows how ANNs can be effectively applied for catalysis prediction, the design of new catalysts, and the understanding of catalytic structures.

show abstract

Scaling Distributed Machine Learning with the Parameter Server

Cited by 1,162 publications

References 28 publications

Distributed Training Large-Scale Deep Architectures

Distributed Training Large-Scale Deep Architectures

A Novel Approach for Processing Big Data

Application of Artificial Neural Networks for Catalysis: A Review

Contact Info

Product

Resources

About