2020
DOI: 10.1016/j.arcontrol.2020.06.001
|View full text |Cite
|
Sign up to set email alerts
|

From inverse optimal control to inverse reinforcement learning: A historical review

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 64 publications
(31 citation statements)
references
References 104 publications
1
25
0
Order By: Relevance
“…Given the exact computation of the control gain K, we present the Stochastic Model-Free IOC LQR SDP Optimization problem, P inv , in (7), and claim that P inv is well-posed and obtains unique solutions, ( QPinv , RPinv ), within a scalar ambiguity of (Q, R). We prove these claims via Theorems 1 and 2.…”
Section: B Stochastic Model-free Ioc Lqr Sdp Optimizationmentioning
confidence: 99%
See 2 more Smart Citations
“…Given the exact computation of the control gain K, we present the Stochastic Model-Free IOC LQR SDP Optimization problem, P inv , in (7), and claim that P inv is well-posed and obtains unique solutions, ( QPinv , RPinv ), within a scalar ambiguity of (Q, R). We prove these claims via Theorems 1 and 2.…”
Section: B Stochastic Model-free Ioc Lqr Sdp Optimizationmentioning
confidence: 99%
“…Historically, many IOC LQR works have focused on recovering the cost function parameters (Q, R) for a known stabilizing gain K [2], [5], [6]. More recently, however, the assumption on the knowledge of the control gain K has been relaxed, and emphasis has been placed on the wellposedness of the IOC LQR problem, i.e., the feasibility of the IOC LQR, and the existence and uniqueness of the feasible solution [7]. Not only does ill-posedness make it difficult to numerically solve the IOC LQR, but also even if there exists a solution, non-uniqueness of the solution does not allow for useful inferences on the decision-making process [8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This study exploited IRL built upon the framework provided by MDPs. 28 MDPs express process objectives mathematically as a reward function. The reward function provides a scalar feedback signal indicative of the optimality of process evolution.…”
Section: Learning From Demonstrations Via Apprenticeshipmentioning
confidence: 99%
“…This study exploited IRL built upon the framework provided by MDPs 28 . MDPs express process objectives mathematically as a reward function.…”
Section: Preliminariesmentioning
confidence: 99%