2017
DOI: 10.48550/arxiv.1702.05796
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Collaborative Deep Reinforcement Learning

Kaixiang Lin,
Shu Wang,
Jiayu Zhou

Abstract: Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from di erent sources to assist the current learning goal. is collaborative learning procedure ensures that the knowledge is shared, continuously re ned, and concluded from di erent perspectives to construct a more profound understanding. e idea of knowledge transfer has led to many advances in machine learning and data mining, but signi ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…Hence, policy training for the CDRL agents must be optimized jointly with the source agent selection and wireless resource allocation. Given these challenges, the problem in (4a)-(4d) cannot be solved via existing CDRL methods such as knowledge distillation [11]- [14] to transfer the source agents' knowledge to the target agent. Moreover, existing CDRL methods cannot be directly applied here as they do not account for the wireless and real-time constraints of knowledge sharing among agents.…”
Section: Problem Formulationmentioning
confidence: 99%
See 3 more Smart Citations
“…Hence, policy training for the CDRL agents must be optimized jointly with the source agent selection and wireless resource allocation. Given these challenges, the problem in (4a)-(4d) cannot be solved via existing CDRL methods such as knowledge distillation [11]- [14] to transfer the source agents' knowledge to the target agent. Moreover, existing CDRL methods cannot be directly applied here as they do not account for the wireless and real-time constraints of knowledge sharing among agents.…”
Section: Problem Formulationmentioning
confidence: 99%
“…where f n (θ n ) represents the loss function of the local model for agent n ∈ N . Considering policy gradient (PG) DRL algorithms [11], the loss function is defined as…”
Section: A Heterogeneous Federated Drlmentioning
confidence: 99%
See 2 more Smart Citations
“…Our framework also learns from imperfect demonstrations, but treats actions performed by a peer policy as demonstrations. Collaborative learning is also studied in [Lin et al, 2017], however, with expensive pre-trained teachers. Meta-learning methods also make use of a teacher model to improve the sample efficiency [Xu et al, 2018b;Xu et al, 2018a;Zha et al, 2019b].…”
Section: Related Workmentioning
confidence: 99%