Online inverse reinforcement learning with limited data

Self, Ryan; Mahmud, Shakil; Hareland, Katrine; Kamalapurkar, Rushikesh

doi:10.48550/arxiv.2008.08972

Cited by 1 publication

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspired by recent results in online Reinforcement Learning methods [16]- [18], IRL has been extended to online implementations where the objective is to learn from a single demonstration or trajectory [19]- [22]. In [20], [21], batch IRL techniques are developed to estimate reward functions in the presence of unmeasureable system states and/or uncertain dynamics for both linear and nonlinear systems.…”

Section: Introductionmentioning

confidence: 99%

“…In [20], [21], batch IRL techniques are developed to estimate reward functions in the presence of unmeasureable system states and/or uncertain dynamics for both linear and nonlinear systems. The case where the trajectories being monitored are suboptimal due to an external disturbance is addressed in [23], and [22] estimates a feedback policy and generates artificial data using the estimated policy to compensate for the sparsity of data in online implementations. However, results such as [19]- [23], either require full state feedback, or rely on state estimators that require dynamical systems in Brunovsky Canonical form.…”

Section: Introductionmentioning

confidence: 99%

“…online IRL methods address uncertainty in the state and control measurements. This paper builds on the authors' previous work in [22], [23], where concurrent learning (CL) update laws are utilized to estimate reward functions online using output feedback. However, the dynamical systems in [22], [23] are required to be in Brunovsky canonical form, and as such, only the output feedback case where the state is comprised of the output and its derivatives is addressed.…”

Section: Introductionmentioning

confidence: 99%

“…This paper builds on the authors' previous work in [22], [23], where concurrent learning (CL) update laws are utilized to estimate reward functions online using output feedback. However, the dynamical systems in [22], [23] are required to be in Brunovsky canonical form, and as such, only the output feedback case where the state is comprised of the output and its derivatives is addressed. In contrast, the IRL observer (IRL-O) technique in this paper generalizes to any observable linear system, since the developed IRL-Os are in a standard observer form where the state estimates are modified based on the innovation (i.e., the error between the actual and the estimated output).…”

Section: Introductionmentioning

confidence: 99%

“…In summary, unlike the IRL results in [23] and [22] that require systems in Brunovsky canonical form, the IRL methods developed in this paper are applicable to general observable linear systems. Contrary to traditional adaptive observers that require PE for stability and convergence, the novel memory-based HSO formulation developed in this paper guarantees boundedness of the weight estimates under loss of excitation.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Online Observer-Based Inverse Reinforcement Learning

Self,

Coleman,

Bai

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, a novel approach to the outputfeedback inverse reinforcement learning (IRL) problem is developed by casting the IRL problem, for linear systems with quadratic cost functions, as a state estimation problem. Two observer-based techniques for IRL are developed, including a novel observer method that re-uses previous state estimates via history stacks. Theoretical guarantees for convergence and robustness are established under appropriate excitation conditions. Simulations demonstrate the performance of the developed observers and filters under noisy and noise-free measurements.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Online Observer-Based Inverse Reinforcement Learning

Self,

Coleman,

Bai

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Online inverse reinforcement learning with limited data

Cited by 1 publication

References 0 publications

Online Observer-Based Inverse Reinforcement Learning

Online Observer-Based Inverse Reinforcement Learning

Contact Info

Product

Resources

About