A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

Lee, Jonathan; Laskey, Michael; Tanwani, Ajay Kumar; Aswani, Anil; Goldberg, Ken

doi:10.29007/25x3

Cited by 75 publications

(19 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We choose not to pursue algorithms with fast static regret rates in COL, as there have been studies on how algorithms can systematically leverage continuity in COL to accelerate learning (Cheng et al, 2019 though they are disguised as online IL research. On the contrary, the knowledge about dynamic regret is less known, except for Lee et al, 2018) (also disguised as online IL) which study the convergence of FTL and mirror descent, respectively.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Online Learning with Continuous Variations: Dynamic Regret and Reductions

Cheng

Goldberg

Boots

2019

Preprint

Self Cite

View full text Add to dashboard Cite

Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice. Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner's decisions. We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs. In particular, we show monotone EPs admits a reduction to achieving sublinear static regret in COL. Using this new setup, we revisit the difficulty of sublinear dynamic regret. We prove a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs. With this insight, we offer conditions for efficient algorithms that achieve sublinear dynamic regret, even when the losses are chosen adaptively without any a priori variation budget. Furthermore, we show for COL a reduction from dynamic regret to both static regret and convergence in the associated EP, allowing us to analyze the dynamic regret of many existing algorithms.

show abstract

Section: Resultsmentioning

confidence: 99%

“…An early analysis of IL was framed using the adversarial, static regret setup (Ross et al, 2011). Recently, results were refined through the use of continuity in the bifunction and dynamic regret Lee et al, 2018;Cheng et al, 2019). This problem again highlights the importance of treating stochasticity as the feedback.…”

Section: Examplesmentioning

confidence: 99%

Online Learning with Continuous Variations: Dynamic Regret and Reductions

Cheng

Goldberg

Boots

2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Using these insights of COL, we revisit online imitation learning (IL) [4] and show it can be framed as a COL problem. We demonstrate that, by using standard analyses of COL, we are able to recover and improve existing understanding of online IL algorithms [4,5,6]. In particular, we characterize existence and uniqueness of solutions, and present convergence and dynamic regret bounds for a common class of IL algorithms in deterministic and stochastic settings.…”

Section: Introductionmentioning

confidence: 91%

“…The use of online learning to analyze online IL is well established [4]. As studied in [5,6], these online losses can be formulated through a bifunction formulation, l n (π) = f πn (π) = E s∼d πn [c(s, π; π ⋆ )], and the policy class Π can be viewed as the decision set X . Naturally, this online learning formulation results in many online IL algorithms resembling standard online learning algorithms, such as follow-the-leader (FTL), which uses full information feedback l n (•) = E s∼d πn [c(s, •; π ⋆ )] at each round, [4] and mirror descent [23], which uses the first-order feedback…”

Section: Application To Online Imitation Learningmentioning

confidence: 99%

“…The original work by Ross et al [4] analyzed FTL in the static regret case by immediate reductions to known static regret bounds of FTL. However, a crucial objective is understanding when these algorithms converge to useful solutions in terms of policy performance, which more recent work has attempted to address [5,6,24]. According to these refined analyses, dynamic regret is a more appropriate solution concept to online IL when π ⋆ / ∈ Π, which is the common case in practice.…”

Section: Application To Online Imitation Learningmentioning

confidence: 99%

See 1 more Smart Citation

Continuous Online Learning and New Insights to Online Imitation Learning

Cheng¹,

Goldberg²,

Boots³

2019

Preprint

Self Cite

View full text Add to dashboard Cite

Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice. Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner's decisions. We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs. Using this new setup, we revisit the difficulty of achieving sublinear dynamic regret. We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP. At the end, we specialize these new insights into online imitation learning and show improved understanding of its learning stability.

show abstract

Towards unconstrained compartment modeling in white matter using diffusion‐relaxation MRI with tensor‐valued diffusion encoding

Lampinen

Szczepankiewicz

Mårtensson

et al. 2020

Magnetic Resonance in Med

111

View full text Add to dashboard Cite

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made. Purpose:To optimize diffusion-relaxation MRI with tensor-valued diffusion encoding for precise estimation of compartment-specific fractions, diffusivities, and T 2 values within a two-compartment model of white matter, and to explore the approach in vivo. Methods: Sampling protocols featuring different b-values (b), b-tensor shapes (b Δ ), and echo times (TE) were optimized using Cramér-Rao lower bounds (CRLB).Whole-brain data were acquired in children, adults, and elderly with white matter lesions. Compartment fractions, diffusivities, and T 2 values were estimated in a model featuring two microstructural compartments represented by a "stick" and a "zeppelin." Results: Precise parameter estimates were enabled by sampling protocols featuring seven or more "shells" with unique b/b Δ /TE-combinations. Acquisition times were approximately 15 minutes. In white matter of adults, the "stick" compartment had a fraction of approximately 0.5 and, compared with the "zeppelin" compartment, featured lower isotropic diffusivities (0.6 vs. 1.3 μm 2 /ms) but higher T 2 values (85 vs. 65 ms). Children featured lower "stick" fractions (0.4). White matter lesions exhibited high "zeppelin" isotropic diffusivities (1.7 μm 2 /ms) and T 2 values (150 ms). Conclusions: Diffusion-relaxation MRI with tensor-valued diffusion encoding expands the set of microstructure parameters that can be precisely estimated and therefore increases their specificity to biological quantities. K E Y W O R D Sbrain microstructure, diffusion-relaxation MRI, Fisher information, tensor-valued diffusion encoding 1606 | LAMPINEN Et AL. S (u) = (K ⊛ P) (u) = ∫ |n|=1 K(u ⋅ n) P(n) dn,(2) K(u ⋅ n) = S 0 J

show abstract

A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

Cited by 75 publications

References 12 publications

Online Learning with Continuous Variations: Dynamic Regret and Reductions

Online Learning with Continuous Variations: Dynamic Regret and Reductions

Continuous Online Learning and New Insights to Online Imitation Learning

Towards unconstrained compartment modeling in white matter using diffusion‐relaxation MRI with tensor‐valued diffusion encoding

Contact Info

Product

Resources

About