2022
DOI: 10.48550/arxiv.2203.01491
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches

Abstract: In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting. We focus on learning with general function classes and general model classes, and we derive results that scale with the eluder dimension of these classes. In contrast to the existing body of work that mainly establishes instance-independent regret guarantees, we focus on the instance-dependent setting and show that the regret scales logarithmically with the hor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 22 publications
(58 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?