Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
Proceedings of the 2014 International Conference on Big Data Science and Computing 2014
DOI: 10.1145/2640087.2644155
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Distributed Machine Learning with the Parameter Server

Abstract: We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance.To demonstrate the scalability of the proposed framework, we show experimental results on petabyt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
1,146
2
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,162 publications
(1,179 citation statements)
references
References 28 publications
6
1,146
2
1
Order By: Relevance
“…To effectively orchestrate multiple machines for a training task, the system must provide a way to manage the globally shared model parameters. The parameter server architecture, i.e., a cluster of machines to manage parameters, is widely-used to reduce I/O latency for handling parameter updates [35,36]. As shown in Figure 1, parameter servers maintain latest parameter values and serve all workers.…”
Section: Distributed Trainingmentioning
confidence: 99%
“…To effectively orchestrate multiple machines for a training task, the system must provide a way to manage the globally shared model parameters. The parameter server architecture, i.e., a cluster of machines to manage parameters, is widely-used to reduce I/O latency for handling parameter updates [35,36]. As shown in Figure 1, parameter servers maintain latest parameter values and serve all workers.…”
Section: Distributed Trainingmentioning
confidence: 99%
“…However, Hadoop provides superior performance than its in-memory counterparts discussed in [8] and [9]. There are other systems that are powerful and versatile with low level programming interfaces [10], [11]. The problem with them is that they are specific and cannot provide general high level programming interface, scheduling and other needed mechanisms.…”
Section: Related Workmentioning
confidence: 99%
“…As its name implies, machine learning is able to "learn" the highly complicated relationships between the independent and dependent variables via non-linear "black box" data processing. During the past decades, it has been widely used in many scientific and industrial areas, such as biology [7][8][9], medicine [10][11][12], energy [13][14][15][16][17][18][19], environment [20][21][22], engineering [23][24][25], and information technology (IT) [26,27]. These application studies indicate that machine learning techniques have dramatically boosted the development of many different areas.…”
Section: Introductionmentioning
confidence: 99%