2021
DOI: 10.48550/arxiv.2110.12727
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Stochastic Shortest Path with Linear Function Approximation

Abstract: We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems the linear mixture SSP. We propose a novel algorithm for learning the linear mixture SSP, which can attain a O(dB 1.5 K/c min ) regret. Here K is the number of episodes, d is the dimension of the feature mapping in the mixture model, B bounds the expected cumulative cost of the optim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
12
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(13 citation statements)
references
References 16 publications
1
12
0
Order By: Relevance
“…Importantly, this bound is horizon-free in the sense that it has no polynomial dependency on T ⋆ or 1 cmin even in the lower order terms. Moreover, this almost matches the best known lower bound Ω(dB ⋆ √ K) from (Min et al, 2021).…”
Section: Introductionsupporting
confidence: 85%
See 4 more Smart Citations
“…Importantly, this bound is horizon-free in the sense that it has no polynomial dependency on T ⋆ or 1 cmin even in the lower order terms. Moreover, this almost matches the best known lower bound Ω(dB ⋆ √ K) from (Min et al, 2021).…”
Section: Introductionsupporting
confidence: 85%
“…There is some gap between our result above and the existing lower bound Ω(dB ⋆ √ K) for this problem (Min et al, 2021). In particular, the dependency on T ⋆ inherited from the H dependency in Lemma 2 is most likely unnecessary.…”
Section: Applying An Efficient Finite-horizon Algorithm For Linear Mdpsmentioning
confidence: 72%
See 3 more Smart Citations