2012
DOI: 10.1007/s13748-012-0026-6
|View full text |Cite
|
Sign up to set email alerts
|

Learning domain structure through probabilistic policy reuse in reinforcement learning

Abstract: Policy Reuse is a transfer learning approach to improve a reinforcement learner with guidance from previously learned similar action policies. The method uses the past policies as a probabilistic bias where the learner chooses among the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. In this work we demonstrate that Policy Reuse further contributes to the learning of the structure of a domain. Interestingly and almost as a side ef… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
17
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 26 publications
1
17
0
Order By: Relevance
“…Fernández et al proposed a policy selection method using probabilities in [15], [16]. In this method called PRQlearning, the reusing policy is decided based on Boltzmann distribution selection method (Eqn.…”
Section: A Probabilistic Policy Reusementioning
confidence: 99%
“…Fernández et al proposed a policy selection method using probabilities in [15], [16]. In this method called PRQlearning, the reusing policy is decided based on Boltzmann distribution selection method (Eqn.…”
Section: A Probabilistic Policy Reusementioning
confidence: 99%
“…The type of knowledge that can be transferred between tasks varies among different TL methods, including value functions [8], entire policies [9], actions (policy advice) [10], or a set of samples from a source task that can be used by a model-based RL algorithm in a target task [11].…”
Section: Transfer Learning and Advising Under A Budgetmentioning
confidence: 99%
“…where An E [0,1] is the transfer rate [11] and y~ni is a randomly generated policy. Now we present the detailed description of the proposed learning algorithm.…”
Section: A a Policy Transfer Based Hierarchical Multi-agent Qlearninmentioning
confidence: 99%