2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2016
DOI: 10.1109/iros.2016.7759596
|View full text |Cite
|
Sign up to set email alerts
|

Optimal control and inverse optimal control by distribution matching

Abstract: Abstract-Optimal control is a powerful approach to achieve optimal behavior. However, it typically requires a manual specification of a cost function which often contains several objectives, such as reaching goal positions at different time steps or energy efficiency. Manually trading-off these objectives is often difficult and requires a high engineering effort. In this paper, we present a new approach to specify optimal behavior. We directly specify the desired behavior by a distribution over future states o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
1

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…The trajectory distribution p(τ ) can be considered as a special case of the feature distribution. Behavior cloning methods such as , Englert et al, 2013 and inverse reinforcement learning methods such as [Arenz et al, 2016] use feature distributions.…”
Section: Trajectory Feature Distributionmentioning
confidence: 99%
See 2 more Smart Citations
“…The trajectory distribution p(τ ) can be considered as a special case of the feature distribution. Behavior cloning methods such as , Englert et al, 2013 and inverse reinforcement learning methods such as [Arenz et al, 2016] use feature distributions.…”
Section: Trajectory Feature Distributionmentioning
confidence: 99%
“…Employed by Maximum margin [Ng and Russell, 2000,a, 2009, Zucker et al, 2011] Maximum entropy , Ramachandran and Amir, 2007, Choi and Kim, 2011b, Kitani et al, 2012, Shiarlis et al, 2016, Finn et al, 2016b] Other [Doerr et al, 2015, Arenz et al, 2016 a nonlinear reward function. On the other hand, IRL with the reward function nonlinear to the features is more challenging than IRL with the linear reward functions.…”
Section: Objectivesmentioning
confidence: 99%
See 1 more Smart Citation
“…On one hand, it regularizes the policy optimization step, which is crucial for convergence of the overall minimax problem. On the other hand, it offers a tractable maximum-entropy SOC framework for dealing with nonlinear dynamics through iterative linearization [21], [23]. To summarize, for every iteration k, we iterate over the updates of the worst-case distribution and its respective optimal policy, for more details see Algorithm 1.…”
Section: Problem Formulationmentioning
confidence: 99%
“…Both areas of research are often formalized as distribution-matching, that is, the learned policy (or the optimal policy for IRL) should induce a distribution over states and actions that is close to the expert's distribution with respect to a given (usually non-metric) distance. Commonly applied distances are the forward Kullback-Leibler (KL) divergence (e.g., Ziebart, 2010), which maximizes the likelihood of the demonstrated state-action pairs under the agent's distribution, and the reverse Kullback-Leibler (RKL) divergence (e.g., Arenz et al, 2016;Fu et al, 2018;Ghasemipour et al, 2020) which minimizes the expected discrimination information (Kullback and Leibler, 1951) of state-action pairs sampled from the agent's distribution. However, since the emergence of generative adversarial networks (GANs, Goodfellow et al, 2014) as a solution technique for both areas, other divergences have been investigated such as the Jensen-Shannon divergence (Ho and Ermon, 2016), the Wasserstein distance (Xiao et al, 2019) and general f -divergences (Ke et al, 2019;Ghasemipour et al, 2020).…”
Section: Introductionmentioning
confidence: 99%