From inverse optimal control to inverse reinforcement learning: A historical review

Azar, Nematollah Ab; Shahmansoorian, Aref; Davoudi, Mehdi

doi:10.1016/j.arcontrol.2020.06.001

Cited by 64 publications

(31 citation statements)

References 104 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the exact computation of the control gain K, we present the Stochastic Model-Free IOC LQR SDP Optimization problem, P inv , in (7), and claim that P inv is well-posed and obtains unique solutions, ( QPinv , RPinv ), within a scalar ambiguity of (Q, R). We prove these claims via Theorems 1 and 2.…”

Section: B Stochastic Model-free Ioc Lqr Sdp Optimizationmentioning

confidence: 99%

“…Historically, many IOC LQR works have focused on recovering the cost function parameters (Q, R) for a known stabilizing gain K [2], [5], [6]. More recently, however, the assumption on the knowledge of the control gain K has been relaxed, and emphasis has been placed on the wellposedness of the IOC LQR problem, i.e., the feasibility of the IOC LQR, and the existence and uniqueness of the feasible solution [7]. Not only does ill-posedness make it difficult to numerically solve the IOC LQR, but also even if there exists a solution, non-uniqueness of the solution does not allow for useful inferences on the decision-making process [8].…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, since model-free approaches utilize a finite number of data points, nonasymptotic properties must be derived in order to show performance guarantees [19]. In general, stochastic IRL works, such as [13]- [15], [20], tend to suffer from ill-posedness and nonconvexity issues [7]. In addition, few IRL works give comments on the sample and computational complexities of their algorithms that require large quantities of data [21], [22].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Low Complexity Approach to Model-Free Stochastic Inverse Linear Quadratic Control

2022

View full text Add to dashboard Cite

In this paper, we present a Model-Free Stochastic Inverse Optimal Control (IOC) algorithm for the discrete-time infinite-horizon stochastic linear quadratic regulator (LQR). Our proposed algorithm exploits the richness of the available system trajectories to recover the control gain K and cost function parameters (Q, R) in a low (space, sample, and computational) complexity manner. By leveraging insights on the stochastic LQR, we guarantee well-posedness of the Model-Free Stochastic IOC LQR via satisfaction of the Certainty Equivalence optimality conditions. The exact solution of the control gain K is recovered via a deterministic, low complexity Least Squares approach. Using K, we solve a completely model-free noniterative SemiDefinite Programming (SDP) problem to obtain a unique (up to a scalar ambiguity) (Q, R), in which optimality and feasibility are jointly ensured. Via derivation of the sample complexity bounds, we show that the non-asymptotic performance of the Model-Free Stochastic IOC LQR can be characterized by the signal-to-noise (SNR) ratio of the finite set of system state and input signals. We present a model-based version of the algorithm for the special case where (A, B) is available, and we, further, provide the extension to the Stochastic Model-Free IOC linear quadratic tracking (LQT) case.

show abstract

Section: B Stochastic Model-free Ioc Lqr Sdp Optimizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Low Complexity Approach to Model-Free Stochastic Inverse Linear Quadratic Control

2022

View full text Add to dashboard Cite

show abstract

“…This study exploited IRL built upon the framework provided by MDPs. 28 MDPs express process objectives mathematically as a reward function. The reward function provides a scalar feedback signal indicative of the optimality of process evolution.…”

Section: Learning From Demonstrations Via Apprenticeshipmentioning

confidence: 99%

“…This study exploited IRL built upon the framework provided by MDPs 28 . MDPs express process objectives mathematically as a reward function.…”

Section: Preliminariesmentioning

confidence: 99%

Using process data to generate an optimal control policy via apprenticeship and reinforcement learning

et al. 2021

View full text Add to dashboard Cite

Reinforcement learning (RL) is a data-driven approach to synthesizing an optimal control policy. A barrier to wide implementation of RL-based controllers is its datahungry nature during online training and its inability to extract useful information from human operator and historical process operation data. Here, we present a twostep framework to resolve this challenge. First, we employ apprenticeship learning via inverse RL to analyze historical process data for synchronous identification of a reward function and parameterization of the control policy. This is conducted offline.Second, the parameterization is improved online efficiently under the ongoing process via RL within only a few iterations. Significant advantages of this framework include to allow for the hot-start of RL algorithms for process optimal control, and robust abstraction of existing controllers and control knowledge from data. The framework is demonstrated on three case studies, showing its potential for chemical process control. K E Y W O R D Sapprenticeship learning, inverse reinforcement learning, machine learning, optimal control, reinforcement learning | INTRODUCTIONRecent initiatives for efficiency improvements in industrial process operation has driven interest in the development of high performance, advanced process control (APC) schemes. Reinforcement learning (RL) has achieved impressive results on benchmark game-based control tasks, 1,2 providing an avenue for research in translation to APC. In

show abstract

Inverse Optimization

Lee

Terekhov²

2022

Encyclopedia of Optimization

View full text Add to dashboard Cite

From inverse optimal control to inverse reinforcement learning: A historical review

Cited by 64 publications

References 104 publications

A Low Complexity Approach to Model-Free Stochastic Inverse Linear Quadratic Control

A Low Complexity Approach to Model-Free Stochastic Inverse Linear Quadratic Control

Using process data to generate an optimal control policy via apprenticeship and reinforcement learning

Inverse Optimization

Contact Info

Product

Resources

About