Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems 2006
DOI: 10.1145/1160633.1160762
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic policy reuse in a reinforcement learning agent

Abstract: We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity function… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
197
0
4

Year Published

2010
2010
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 189 publications
(201 citation statements)
references
References 9 publications
(8 reference statements)
0
197
0
4
Order By: Relevance
“…The level of knowledge that can be transferred across tasks can be low, such as tuples of the form s, a, r, s ′ [6,10], value-functions [12] or policies [2]. Higher level knowledge may include rules [7,13], action subsets or shaping rewards [5].…”
Section: Transfer Learning In Rlmentioning
confidence: 99%
“…The level of knowledge that can be transferred across tasks can be low, such as tuples of the form s, a, r, s ′ [6,10], value-functions [12] or policies [2]. Higher level knowledge may include rules [7,13], action subsets or shaping rewards [5].…”
Section: Transfer Learning In Rlmentioning
confidence: 99%
“…We also briefly introduce a similarity concept between policies. Lastly, we review the PRQ-Learning algorithm [8].…”
Section: Policy Reusementioning
confidence: 99%
“…An efficient solution to Policy Reuse is the PRQ-Learning algorithm [8], which automatically answers two questions: (i) which policy, from the set {Π * 1 , . .…”
Section: Domains Tasks and Mdpsmentioning
confidence: 99%
See 2 more Smart Citations