Proceedings of the 2021 International Conference on Management of Data 2021
DOI: 10.1145/3448016.3459240
|View full text |Cite
|
Sign up to set email alerts
|

Towards Demystifying Serverless Machine Learning Training

Abstract: The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over "serverful" infrastructures (IaaS). In this paper we present a systematic, comparative study of distributed ML training over FaaS and IaaS. We p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 64 publications
(31 citation statements)
references
References 50 publications
(54 reference statements)
0
31
0
Order By: Relevance
“…They found the most challenging issues of using serverless computing runtimes for training machine learning models to be their ephemerality, statelessness, and warm-up latency. In [13], Jiang et al present a comparative study of distributed ML training over FaaS and IaaS. They found that serverless training is only cost-effective with models that have a reduced communication overhead and quick convergence.…”
Section: Related Workmentioning
confidence: 99%
“…They found the most challenging issues of using serverless computing runtimes for training machine learning models to be their ephemerality, statelessness, and warm-up latency. In [13], Jiang et al present a comparative study of distributed ML training over FaaS and IaaS. They found that serverless training is only cost-effective with models that have a reduced communication overhead and quick convergence.…”
Section: Related Workmentioning
confidence: 99%
“…It also enables implementing truly distributed data processing operators such as joins instead of using today's contrived solutions which need to communicate through storage [24,25]. Similar ideas apply to ML over serverless, which today is expensive due to the lack of communication [19]. Boxer can also be used to implement a form of work stealing among functions of a serverless applications, since functions could communicate directly with one another to request additional work if they are idle.…”
Section: Opportunitiesmentioning
confidence: 99%
“…Serverless Machine Learning. The majority of the previous works [28], [29], [30] in this domain have focused on distributed ML training using FaaS functions. Siren [28], allows users to train ML models in the cloud using finegrained functions, thereby removing the burden of non-trivial cluster provisioning and management from the developers.…”
Section: Related Workmentioning
confidence: 99%
“…It provides a lightweight worker runtime for the cloud functions that support various ML models. Jiang et al [30] analyze the cost-performance trade-offs between Infrastructure-as-a-Service (IaaS) and FaaS for distributed ML. Towards this, they developed LambdaML, which supports different distributed ML variants such as synchronous vs. asynchronous training, purely FaaS-based, or a hybrid (FaaS/IaaS) training approach.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation