Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) 2023
DOI: 10.18653/v1/2023.acl-demo.54
|View full text |Cite
|
Sign up to set email alerts
|

Petals: Collaborative Inference and Fine-tuning of Large Models

Abstract: Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. We observe that a large enough model (50B+) can run efficiently even on geodistributed devices in a consumer-gra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…Borzunov et al 12 proposed an innovative approach to distribute the workload of LLMs across multiple servers. This method involves splitting the LLM model on consumer hardware and executing it in a distributed manner, incorporating dynamic quantization and server load balancing along with fault tolerance.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Borzunov et al 12 proposed an innovative approach to distribute the workload of LLMs across multiple servers. This method involves splitting the LLM model on consumer hardware and executing it in a distributed manner, incorporating dynamic quantization and server load balancing along with fault tolerance.…”
Section: Related Workmentioning
confidence: 99%
“…The selection of edge devices for services is based on the pheromone trail strength, which reflects the accumulated experience of previous iterations, and a heuristic factor to include domain-specific knowledge or performance metrics. This decision-making process is followed with an immediate update to the local pheromone value, following each service placement (lines [11][12][13][14][15]. This local update serves as a rapid feedback technique, guiding subsequent selections within the same cycle.…”
Section: Algorithm 1: Edgegenacomentioning
confidence: 99%
“…The concept of local-SGD (or FedAvg) has previously been applied in the realm of language modeling. Crossdevice federated learning, for instance, has been utilized to pretrain and fine-tune language models (Borzunov et al, 2022;Diskin et al, 2021a;Hilmkil et al, 2021;Presser, 2020;Ro et al, 2022;Ryabinin et al, 2021). More recently, DiLoCo has extended the local-SGD methodology to encompass larger language models, specifically proposing the use of AdamW + Nesterov momentum as the InnerOpt + OuterOpt pairing.…”
Section: Local-sgd For Language Modelingmentioning
confidence: 99%