2019
DOI: 10.29007/25x3
|View full text |Cite
|
Sign up to set email alerts
|

A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

Abstract: On-policy imitation learning algorithms such as DAgger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for DAgger. However, in more… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 75 publications
(19 citation statements)
references
References 12 publications
0
19
0
Order By: Relevance
“…We choose not to pursue algorithms with fast static regret rates in COL, as there have been studies on how algorithms can systematically leverage continuity in COL to accelerate learning (Cheng et al, 2019 though they are disguised as online IL research. On the contrary, the knowledge about dynamic regret is less known, except for Lee et al, 2018) (also disguised as online IL) which study the convergence of FTL and mirror descent, respectively.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We choose not to pursue algorithms with fast static regret rates in COL, as there have been studies on how algorithms can systematically leverage continuity in COL to accelerate learning (Cheng et al, 2019 though they are disguised as online IL research. On the contrary, the knowledge about dynamic regret is less known, except for Lee et al, 2018) (also disguised as online IL) which study the convergence of FTL and mirror descent, respectively.…”
Section: Resultsmentioning
confidence: 99%
“…An early analysis of IL was framed using the adversarial, static regret setup (Ross et al, 2011). Recently, results were refined through the use of continuity in the bifunction and dynamic regret Lee et al, 2018;Cheng et al, 2019). This problem again highlights the importance of treating stochasticity as the feedback.…”
Section: Examplesmentioning
confidence: 99%
“…Using these insights of COL, we revisit online imitation learning (IL) [4] and show it can be framed as a COL problem. We demonstrate that, by using standard analyses of COL, we are able to recover and improve existing understanding of online IL algorithms [4,5,6]. In particular, we characterize existence and uniqueness of solutions, and present convergence and dynamic regret bounds for a common class of IL algorithms in deterministic and stochastic settings.…”
Section: Introductionmentioning
confidence: 91%
“…The use of online learning to analyze online IL is well established [4]. As studied in [5,6], these online losses can be formulated through a bifunction formulation, l n (π) = f πn (π) = E s∼d πn [c(s, π; π ⋆ )], and the policy class Π can be viewed as the decision set X . Naturally, this online learning formulation results in many online IL algorithms resembling standard online learning algorithms, such as follow-the-leader (FTL), which uses full information feedback l n (•) = E s∼d πn [c(s, •; π ⋆ )] at each round, [4] and mirror descent [23], which uses the first-order feedback…”
Section: Application To Online Imitation Learningmentioning
confidence: 99%
See 1 more Smart Citation