2018
DOI: 10.48550/arxiv.1810.00468
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bayesian Transfer Reinforcement Learning with Prior Knowledge Rules

Abstract: We propose a probabilistic framework to directly insert prior knowledge in reinforcement learning (RL) algorithms by defining the behaviour policy as a Bayesian posterior distribution. Such a posterior combines task-specific information with prior knowledge, thus allowing to achieve transfer learning across tasks. The resulting method is flexible and it can be easily incorporated to any standard off-policy and on-policy algorithms, such as those based on temporal differences and policy gradients. We develop a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 3 publications
(9 reference statements)
0
3
0
Order By: Relevance
“…For example, incorporating prior knowledge as constraints or loss functions for conventional machine learning algorithms. Since current breakthroughs on games are mostly relying on reinforcement learning which is low sample efficient, how to achieve sample efficient reinforcement learning based on human knowledge is a future direction [55], [56].…”
Section: Low Resources Aimentioning
confidence: 99%
“…For example, incorporating prior knowledge as constraints or loss functions for conventional machine learning algorithms. Since current breakthroughs on games are mostly relying on reinforcement learning which is low sample efficient, how to achieve sample efficient reinforcement learning based on human knowledge is a future direction [55], [56].…”
Section: Low Resources Aimentioning
confidence: 99%
“…This idea is inspired by the BDQ architecture [24] and extended to PG methods. In particular, this allows us to define Bayesian policies and incorporate prior information about the MA problem into the DRL agent [32].…”
Section: B Contributions and Outlinementioning
confidence: 99%
“…In order to incorporate this prior knowledge into the RL agent, we introduce a Bayesian policy inspired by [32]. We express the posterior policy q(a|A; θ π ) as a function of the prior over the agent state f (a; A) and the task specific policy π(a|A; θ π ) parameterized by θ π with the Bayes rule:…”
Section: B Exploiting Prior Knowledgementioning
confidence: 99%