2016
DOI: 10.1007/s10994-016-5580-x
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic inference for determining options in reinforcement learning

Abstract: Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
85
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
4
1
1

Relationship

2
8

Authors

Journals

citations
Cited by 73 publications
(86 citation statements)
references
References 21 publications
(30 reference statements)
1
85
0
Order By: Relevance
“…Other related work for hierarchical formulations include Feudal RL [24] which consists of "managers" taking decisions at various levels of granularity, percolating all the way down to atomic actions made by the agent. [25] jointly learn options and hierarchical policies over them. Such joint search makes the problem more difficult to solve, moreover, options are not shared across policies of different tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Other related work for hierarchical formulations include Feudal RL [24] which consists of "managers" taking decisions at various levels of granularity, percolating all the way down to atomic actions made by the agent. [25] jointly learn options and hierarchical policies over them. Such joint search makes the problem more difficult to solve, moreover, options are not shared across policies of different tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Some of these approaches identify frequently used action sequences from successful trajectories (McGovern, 2002;Girgin et al, 2006;Vezhnevets et al, 2016). Other approaches posit a generative model for policies that favors temporal abstraction, and then perform probabilistic inference to find the optimal policy (Wingate et al, 2013;Daniel et al, 2016).…”
Section: Option Discoverymentioning
confidence: 99%
“…We show how to apply a similar idea to differential dynamic programming (DDP) with stochastic dynamics and partial observations. Instead of using hybrid control for optimizing trajectories, reinforcement learning approaches based on the options framework can compute high level discrete actions, also called options [19], and execute a continuous control policy for each high level action.…”
Section: Related Workmentioning
confidence: 99%