Dynamic parameter allocation in parameter servers

Renz-Wieland, Alexander; Gemulla, Rainer; Zeuch, Steffen; Markl, Volker

doi:10.14778/3407790.3407796

Cited by 13 publications

(21 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast to static full replication, a classic PS uses network bandwidth only when parameters are actually accessed. However, a classic PS is often inefficient due to access latency [41,42]. Figure 1 depicts the performance of both approaches for a task of training large-scale knowledge graph embeddings.…”

Section: Model Qualitymentioning

confidence: 99%

“…For example, the Petuum PS [12,16] selectively replicates parameters on specific nodes when the nodes access these parameters. The Lapse PS [42] dynamically relocates parameters among nodes to hide access latency. Multi-technique PSs [41,57] combine different parameter management techniques (e.g., replication and relocation) and pick a suitable one for each parameter.…”

Section: Model Qualitymentioning

confidence: 99%

“…choice of management technique, and require the application to time adaptation. synchronous network communication for sending messages to the node that holds the parameter [41,42]. In summary, a classic PS is very easy to use, but inefficient due to synchronous network communication.…”

Section: Classic Psmentioning

confidence: 99%

See 2 more Smart Citations

Good Intentions: Adaptive Parameter Servers via Intent Signaling

Renz-Wieland¹,

Kieslinger²,

Gericke³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Parameter servers (PSs) ease the implementation of distributed training for large machine learning (ML) tasks by providing primitives for shared parameter access. Especially for ML tasks that access parameters sparsely, PSs can achieve high efficiency and scalability. To do so, they employ a number of techniques-such as replication or relocation-to reduce communication cost and/or latency of parameter accesses. A suitable choice and parameterization of these techniques is crucial to realize these gains, however. Unfortunately, such choices depend on the task, the workload, and even individual parameters, they often require expensive upfront experimentation, and they are susceptible to workload changes. In this paper, we explore whether PSs can automatically adapt to the workload without any prior tuning. Our goals are to improve usability and to maintain (or even improve) efficiency. We propose (i) a novel intent signaling mechanism that acts as an enabler for adaptivity and naturally integrates into ML tasks, and (ii) a fully adaptive, zero-tuning PS called AdaPS based on this mechanism. Our experimental evaluation suggests that automatic adaptation to the workload is indeed possible: AdaPS matched or outperformed state-of-the-art PSs out of the box.

show abstract

Section: Model Qualitymentioning

confidence: 99%

Section: Model Qualitymentioning

confidence: 99%

See 1 more Smart Citation

Good Intentions: Adaptive Parameter Servers via Intent Signaling

Renz-Wieland¹,

Kieslinger²,

Gericke³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Each worker executes the training algorithm over its local partition, and synchronizes with other workers from time to time. A typical implementation of data parallelism is parameter server [2,29,30,45,50,63,84]. Another popular implementation is message passing interface (MPI) [38], e.g., the AllReduce MPI primitive leveraged by MLlib [72], XGBoost [27], PyTorch [64], etc [60].…”

Section: Related Workmentioning

confidence: 99%

Towards Demystifying Serverless Machine Learning Training

Jiang

Gan

Liu

et al. 2021

Proceedings of the 2021 International Conference on Management of Data

View full text Add to dashboard Cite

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over "serverful" infrastructures (IaaS). In this paper we present a systematic, comparative study of distributed ML training over FaaS and IaaS. We present a design space covering design choices such as optimization algorithms and synchronization protocols, and implement a platform, LambdaML, that enables a fair comparison between FaaS and IaaS. We present experimental results using LambdaML, and further develop an analytic model to capture cost/performance tradeoffs that must be considered when opting for a serverless infrastructure. Our results indicate that ML training pays off in serverless only for models with efficient (i.e., reduced) communication and that quickly converge. In general, FaaS can be much faster but it is never significantly cheaper than IaaS.

show abstract

“…Also, we utilize a modified distributed scheme to speed it up. Parsa [15] proposes a distributed partition algorithm to reduce the communication overhead. As it aims at memory-resident PS, Parsa does not take disk I/O cost into account, which cannot be neglected in our DRPS.…”

Section: Parameter Index and Partitionmentioning

confidence: 99%

DRPS: efficient disk-resident parameter servers for distributed machine learning

Song

Zhi-gang

2021

Front. Comput. Sci.

View full text Add to dashboard Cite

Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident parameter server system named DRPS, which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk. To further improve the performance of DRPS, we build an efficient index structure for parameters to reduce the disk I/O cost. Based on this index structure, we propose a novel multi-objective partitioning algorithm for the parameters. Finally, a flexible workerselection parallel model of computation (WSP) is proposed to strike a right balance between the problem of inconsistent parameter versions (staleness) and that of inconsistent execution progresses (straggler). Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.

show abstract

Dynamic parameter allocation in parameter servers

Cited by 13 publications

References 40 publications

Good Intentions: Adaptive Parameter Servers via Intent Signaling

Good Intentions: Adaptive Parameter Servers via Intent Signaling

Towards Demystifying Serverless Machine Learning Training

DRPS: efficient disk-resident parameter servers for distributed machine learning

Contact Info

Product

Resources

About