pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning

Guo, Tao; Guo, Song; Wang, Junxiao

doi:10.1145/3543507.3583518

Cited by 8 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…FedCLIP [ 38 ] added an adapter module after the CLIP backbone to achieve the efficient deployment of the CLIP model [ 70 ] with federated clients. Some studies [ 19 , 35 ] have utilized the idea of prompt training to aggregate the user consensus via a learnable prompt and improve the users’ characteristics in the visual domain. Improving the ability to integrate large-scale pre-trained models will greatly enhance the performance of MFL systems.…”

Section: Discussionmentioning

confidence: 99%

“…CreamFL [ 37 ] allowed both unimodal and multimodal vision–language tasks in federated systems. pFedPrompt [ 35 ] adapted the prompt training method to leverage large foundation models into federated learning systems to connect vision and language data. FedCMR [ 11 ] explored the federated cross-modal retrieval task and mitigated the representation space gap via weighted aggregation based on the local data amount and category number.…”

Section: Tasks For Multimodal Federated Learningmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Federated Learning: A Survey

Che

Wang

Zhou³

et al. 2023

Sensors

View full text Add to dashboard Cite

Federated learning (FL), which provides a collaborative training scheme for distributed data sources with privacy concerns, has become a burgeoning and attractive research area. Most existing FL studies focus on taking unimodal data, such as image and text, as the model input and resolving the heterogeneity challenge, i.e., the challenge of non-identical distribution (non-IID) caused by a data distribution imbalance related to data labels and data amount. In real-world applications, data are usually described by multiple modalities. However, to the best of our knowledge, only a handful of studies have been conducted to improve system performance utilizing multimodal data. In this survey paper, we identify the significance of this emerging research topic of multimodal federated learning (MFL) and present a literature review on the state-of-art MFL methods. Furthermore, we categorize multimodal federated learning into congruent and incongruent multimodal federated learning based on whether all clients possess the same modal combinations. We investigate the feasible application tasks and related benchmarks for MFL. Lastly, we summarize the promising directions and fundamental challenges in this field for future research.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Tasks For Multimodal Federated Learningmentioning

confidence: 99%

Multimodal Federated Learning: A Survey

Che

Wang

Zhou³

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…(1) Image classification. Sun et al 2022) evaluate the existing PEFT baselines combined with FL, while (Guo et al 2022;Guo, Guo, and Wang 2023;Li et al 2023;Lu et al 2023) finetune the CLIP model (Radford et al 2021) via tuning and communicating only small amount of learnable (personalized) prompts. (Su et al 2022) addresses the problem of heterogeneous client images by injecting lightweight adaptation modules (adapters) (Houlsby et al 2019).…”

Section: Related Workmentioning

confidence: 99%

“…Existing works have predominantly explored a basic combination of centralized PEFT algorithms and FedAvg. For instance, some approaches focus on training and communicating only the tiny adaptation modules (adapter) (Houlsby et al 2019;Su et al 2022) or a small amount of trainable input tokens (Guo et al 2022;Guo, Guo, and Wang 2023). However, these investigations are limited to single modality scenarios, where only visual or textual tasks are considered.…”

Section: Introductionmentioning

confidence: 99%

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Chen,

Zhang,

Krompass

et al. 2024

AAAI

View full text Add to dashboard Cite

Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.

show abstract

“…Finally, FL algorithms like FedAvg and SCAFFOLD can be enhanced using momentum, leading to improved convergence rates and performance, even with varying data heterogeneity and partial client participation [154]. The authors of [155] introduced personalized federated learning (pFL) and demonstrated its application in tailoring models for diverse users within a decentralized system. Additionally, they introduced the Contextual Optimization (CoOp) method for fine-tuning pre-trained vision-language models.…”

Section: Federated Learningmentioning

confidence: 99%

A Review of Machine Learning and Deep Learning for Object Detection, Semantic Segmentation, and Human Action Recognition in Machine and Robotic Vision

Manakitsa,

Maraslidis,

Moysis

et al. 2024

Technologies

View full text Add to dashboard Cite

Machine vision, an interdisciplinary field that aims to replicate human visual perception in computers, has experienced rapid progress and significant contributions. This paper traces the origins of machine vision, from early image processing algorithms to its convergence with computer science, mathematics, and robotics, resulting in a distinct branch of artificial intelligence. The integration of machine learning techniques, particularly deep learning, has driven its growth and adoption in everyday devices. This study focuses on the objectives of computer vision systems: replicating human visual capabilities including recognition, comprehension, and interpretation. Notably, image classification, object detection, and image segmentation are crucial tasks requiring robust mathematical foundations. Despite the advancements, challenges persist, such as clarifying terminology related to artificial intelligence, machine learning, and deep learning. Precise definitions and interpretations are vital for establishing a solid research foundation. The evolution of machine vision reflects an ambitious journey to emulate human visual perception. Interdisciplinary collaboration and the integration of deep learning techniques have propelled remarkable advancements in emulating human behavior and perception. Through this research, the field of machine vision continues to shape the future of computer systems and artificial intelligence applications.

show abstract

pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning

Cited by 8 publications

References 23 publications

Multimodal Federated Learning: A Survey

Multimodal Federated Learning: A Survey

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

A Review of Machine Learning and Deep Learning for Object Detection, Semantic Segmentation, and Human Action Recognition in Machine and Robotic Vision

Contact Info

Product

Resources

About