2017
DOI: 10.1287/moor.2016.0826
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

Abstract: We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function Q * lies within a known hypothesis class Q, OCP selects optimal actions over all but at most dimE [Q] episodes, where dimE denotes the eluder dimension. We establish further efficiency and asymptotic pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

3
40
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 25 publications
(43 citation statements)
references
References 25 publications
(35 reference statements)
3
40
0
Order By: Relevance
“…These assumptions remain much stronger than the realizability assumption considered herein, where only the optimal Q-function Q is assumed to be linearly representable. Wen and Van Roy (2017) showed that sample-efficient RL is feasible in deterministic systems, which has been extended to stochastic systems with low variance in Du et al (2020b) under additional gap assumptions. In addition, Weisz et al (2021b) established exponential sample complexity lower bounds under the generative model when only Q is linearly realizable; their construction critically relied on making the action set exponentially large.…”
Section: Additional Related Workmentioning
confidence: 96%
“…These assumptions remain much stronger than the realizability assumption considered herein, where only the optimal Q-function Q is assumed to be linearly representable. Wen and Van Roy (2017) showed that sample-efficient RL is feasible in deterministic systems, which has been extended to stochastic systems with low variance in Du et al (2020b) under additional gap assumptions. In addition, Weisz et al (2021b) established exponential sample complexity lower bounds under the generative model when only Q is linearly realizable; their construction critically relied on making the action set exponentially large.…”
Section: Additional Related Workmentioning
confidence: 96%
“…This research was further extended to kernel and neural function approximation in the recent work of ; Wang et al (2020). Other approaches in this approximation setting are either computationally intractable (Krishnamurthy et al, 2016;Dann et al, 2018;Dong et al, 2020) or require strong assumptions on the transition model (Wen & Van Roy, 2017).…”
Section: Related Workmentioning
confidence: 99%
“…There has been substantial recent theoretical interest in understanding the means by which we can avoid the curse of dimensionality and obtain sample-efficient reinforcement learning (RL) methods [Wen and Van Roy, 2017, Du et al, 2019c,b, Wang et al, 2019, Yang and Wang, 2019, Cai et al, 2020, Zanette et al, 2020, Zhou et al, 2020b,a, Modi et al, 2020, Ayoub et al, 2020. Here, the extant body of literature largely focuses on sufficient conditions for efficient reinforcement learning.…”
Section: Introductionmentioning
confidence: 99%