2021
DOI: 10.48550/arxiv.2106.07203
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Abstract: Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al. [2020c] establish a value-based algorithm with general function approximation that enjoys O(poly(dH) √ K) 1 regret bound, where d depends on the complexity of the function class, H is the planning horizon, and K is the total number of episodes. However, their algorithm requires Ω(K) computation time per round, rendering the algorithm inefficient for practica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
42
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(43 citation statements)
references
References 24 publications
1
42
0
Order By: Relevance
“…Proposed by Russo and Van Roy (2013), eluder dimension has become a widely-used concept to characterize the complexity of different function classes in bandits and RL Ayoub et al, 2020;Jin et al, 2021;Kong et al, 2021). In this work, we define eluder dimension to characterize the complexity of the function F :…”
Section: A2 Eluder Dimensionmentioning
confidence: 99%
See 1 more Smart Citation
“…Proposed by Russo and Van Roy (2013), eluder dimension has become a widely-used concept to characterize the complexity of different function classes in bandits and RL Ayoub et al, 2020;Jin et al, 2021;Kong et al, 2021). In this work, we define eluder dimension to characterize the complexity of the function F :…”
Section: A2 Eluder Dimensionmentioning
confidence: 99%
“…This corresponds to the last term (KD) in Inq 35. Therefore, to design efficient algorithm with near-optimal regret in the infinite-horizon setting, the algorithm should maintain lowswitching property (Bai et al, 2019;Kong et al, 2021). Taking inspiration from the recent work that studies efficient exploration with low switching cost in episodic setting (Kong et al, 2021), we define the importance score, sup f1,f2∈F f1−f2 2 Znew f1−f2 2 Z +α , as a measure of the importance for new samples collected in current episode, and only update the optimistic model and the policy when the importance score is greater than 1.…”
Section: C3 Infinite Simulator Classmentioning
confidence: 99%
“…Beyond linear function approximation, in the finite-horizon setting researchers also start considering theoretical guarantees for general function approximation (Wang et al, 2020;Ishfaq et al, 2021;Kong et al, 2021). The study for SSP, which again is a strict generalization of the finite-horizon problems and might be a better model for many applications, falls behind in this regard, motivating us to explore in this direction with the goal of providing a more complete picture at least for linear function approximation.…”
Section: Related Workmentioning
confidence: 99%
“…Nonlinear generalizations: Some nonlinear generalizations of LMDPs have been proposed, such as the case where the state-action value function belongs to a class of bounded eluder dimension [Russo and Van Roy, 2013] or can be represented by a kernel function or neural network. While such generalization is important, to our knowledge, these works (see, e.g., Chowdhury and Oliveira [2020], Ishfaq et al [2021], Kong et al [2021], Wang et al [2019, Yang et al [2020b,a]) fail to improve over Jin et al [2020], Zanette et al [2020b] in terms of (P1), (P2), or (P3) (or regret).…”
Section: Related Workmentioning
confidence: 99%