2020
DOI: 10.48550/arxiv.2006.04338
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Abstract: We develop a mathematical framework for solving multi-task reinforcement learning problems based on a type of decentralized policy gradient method. The goal in multi-task reinforcement learning is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state and action spaces, but have different rewards and dynamics. Agents immersed in each of these environments communicate with other agents by sharing their models (i.e. their policy parame… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 32 publications
0
10
0
Order By: Relevance
“…Additionally, the shared parameters are trained using more data (i.e., data drawn from all agents) compared with the personalized parameters, so the variance in the training process can be significantly reduced, potentially resulting in better training performance. Such an intuition has been verified empirically in multi-task RL systems [33,34], where sharing policies among different learners results in more stable convergence.…”
Section: The Proposed Formulationmentioning
confidence: 79%
See 2 more Smart Citations
“…Additionally, the shared parameters are trained using more data (i.e., data drawn from all agents) compared with the personalized parameters, so the variance in the training process can be significantly reduced, potentially resulting in better training performance. Such an intuition has been verified empirically in multi-task RL systems [33,34], where sharing policies among different learners results in more stable convergence.…”
Section: The Proposed Formulationmentioning
confidence: 79%
“…the homogeneous setting), the agent's policy should also be closely related to each other. Such an intuition has been verified empirically in MARL systems [31,32], as well as in multi-task RL systems [33,34], where sharing policies among different learners results in more stable convergence and/or better feature extraction. However, it is not clear how to design and analyze more sophisticated collaboration schemes which enable the agents to (partially) share their local policies to help them leverage each other's past experience and build better behavior strategies.…”
Section: Introductionmentioning
confidence: 82%
See 1 more Smart Citation
“…While our DSA framework is fairly general, a key limitation is that the scaling matrix (i.e., A) in each component function h i needs to be the same. It would be interesting to see if our approach can be extended to cover the general case Zeng et al [2020a] where the scaling matrices also depend on i. Another intriguing future direction is the setting with dynamic communication protocols, wherein the gossip matrix also evolves with time Doan et al [2019Doan et al [ , 2021.…”
Section: Discussionmentioning
confidence: 99%
“…The first is the CLT shown in Morral et al [2017] for the average of estimates obtained at different nodes in a generic DSA scheme. The other is the convergence in mean result obtained in Zeng et al [2020a] for a distributed policy gradient method.…”
Section: Related Workmentioning
confidence: 99%