2023
DOI: 10.48550/arxiv.2302.04870
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Offsite-Tuning: Transfer Learning without Full Model

Abstract: Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 35 publications
0
11
0
Order By: Relevance
“…We run the same task of a method in the above-mentioned grid search space three times with different random seeds, choose the best result from each run, and report the mean and standard deviation of these best results. For all question-answering tasks, we sweep learning rates in {1, 3, 5, 7}•10 −4 , batch sizes in {8, 16, 32} and the number of epochs in {3, 5, 10}, and keep other settings the same, which is inspired by [48]. The sequence length for all tasks is set to 512, 128, 128 and 128 for BERT base , RoBERTa large , BART large and OPT as our baselines, respectively.…”
Section: Methodsmentioning
confidence: 99%
“…We run the same task of a method in the above-mentioned grid search space three times with different random seeds, choose the best result from each run, and report the mean and standard deviation of these best results. For all question-answering tasks, we sweep learning rates in {1, 3, 5, 7}•10 −4 , batch sizes in {8, 16, 32} and the number of epochs in {3, 5, 10}, and keep other settings the same, which is inspired by [48]. The sequence length for all tasks is set to 512, 128, 128 and 128 for BERT base , RoBERTa large , BART large and OPT as our baselines, respectively.…”
Section: Methodsmentioning
confidence: 99%
“…Zaken et al [46] introduced BitFit, a sparse update method that modifies only the bias terms of the model. Xiao et al [42] proposed offsite-tuning, which finetunes the top and bottom layers of model and compresses the large middle layers into a emulator using layer-drop. Lester et al [28] proposed prompt tuning, requiring to store and tune only a small task-specific prompt (i.e., a few tokens) for each downstream task.…”
Section: Efficient Finetuningmentioning
confidence: 99%
“…Compared to full-parameter fine-tuning, these PEFT algorithms significantly reduce memory consumption, training time, and communication cost for fine-tuning LLMs. Besides, motivated by the practical concerns on intelligent property protection of LLMs, we also integrate a privacy-preserving fine-tuning algorithm, offsite-tuning (Xiao et al, 2023), for the scenario where clients only tune small adapters based on a distilled model from a full LLM.…”
Section: Overviewmentioning
confidence: 99%
“…To satisfy such practical demand, we adapt a privacy-preserving fine-tuning algorithm, offsitetuning (Xiao et al, 2023), to a federated version, and name it FedOT for short. It sends a lossy compressed model with untrainable parameters to the clients as an emulator of the complete LLM at the beginning of FL.…”
Section: Federated Fine-tuning Without Accessing Full Modelmentioning
confidence: 99%
See 1 more Smart Citation