Probabilistic inference for determining options in reinforcement learning

Daniel, Christian; Hoof, Herke van; Peters, Jan; Neumann, Gerhard

doi:10.1007/s10994-016-5580-x

Cited by 73 publications

(86 citation statements)

References 21 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other related work for hierarchical formulations include Feudal RL [24] which consists of "managers" taking decisions at various levels of granularity, percolating all the way down to atomic actions made by the agent. [25] jointly learn options and hierarchical policies over them. Such joint search makes the problem more difficult to solve, moreover, options are not shared across policies of different tasks.…”

Section: Related Workmentioning

confidence: 99%

Model Learning for Look-Ahead Exploration in Continuous Control

Agarwal

Muelling

Fragkiadaki

2019

AAAI

View full text Add to dashboard Cite

We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies . Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.

show abstract

Section: Related Workmentioning

confidence: 99%

Model Learning for Look-Ahead Exploration in Continuous Control

Agarwal

Muelling

Fragkiadaki

2019

AAAI

View full text Add to dashboard Cite

show abstract

“…Some of these approaches identify frequently used action sequences from successful trajectories (McGovern, 2002;Girgin et al, 2006;Vezhnevets et al, 2016). Other approaches posit a generative model for policies that favors temporal abstraction, and then perform probabilistic inference to find the optimal policy (Wingate et al, 2013;Daniel et al, 2016).…”

Section: Option Discoverymentioning

confidence: 99%

Discovery of Hierarchical Representations for Efficient Planning

Tomov

Yagati

Kumar

et al. 2018

Preprint

View full text Add to dashboard Cite

SummaryWe propose that humans spontaneously organize environments into clusters of states that support hierarchical planning, enabling them to tackle challenging problems by breaking them down into sub-problems at various levels of abstraction. People constantly rely on such hierarchical presentations to accomplish tasks big and small – from planning one’s day, to organizing a wedding, to getting a PhD – often succeeding on the very first attempt. We formalize a Bayesian model of hierarchy discovery that explains how humans discover such useful abstractions. Building on principles developed in structure learning and robotics, the model predicts that hierarchy discovery should be sensitive to the topological structure, reward distribution, and distribution of tasks in the environment. In five simulations, we show that the model accounts for previously reported effects of environment structure on planning behavior, such as detection of bottleneck states and transitions. We then test the novel predictions of the model in eight behavioral experiments, demonstrating how the distribution of tasks and rewards can influence planning behavior via the discovered hierarchy, sometimes facilitating and sometimes hindering performance. We find evidence that the hierarchy discovery process unfolds incrementally across trials. We also find that people use uncertainty to guide their learning in a way that is informative for hierarchy discovery. Finally, we propose how hierarchy discovery and hierarchical planning might be implemented in the brain. Together, these findings present an important advance in our understanding of how the brain might use Bayesian inference to discover and exploit the hidden hierarchical structure of the environment.

show abstract

“…We show how to apply a similar idea to differential dynamic programming (DDP) with stochastic dynamics and partial observations. Instead of using hybrid control for optimizing trajectories, reinforcement learning approaches based on the options framework can compute high level discrete actions, also called options [19], and execute a continuous control policy for each high level action.…”

Section: Related Workmentioning

confidence: 99%

Hybrid control trajectory optimization under uncertainty

Pajarinen

Kyrki

Koval

et al. 2017

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Self Cite

View full text Add to dashboard Cite

Abstract-Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.

show abstract

Probabilistic inference for determining options in reinforcement learning

Cited by 73 publications

References 21 publications

Model Learning for Look-Ahead Exploration in Continuous Control

Model Learning for Look-Ahead Exploration in Continuous Control

Discovery of Hierarchical Representations for Efficient Planning

Hybrid control trajectory optimization under uncertainty

Contact Info

Product

Resources

About