2020
DOI: 10.48550/arxiv.2003.06898
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Fei Feng,
Ruosong Wang,
Wotao Yin
et al.

Abstract: We study how to use unsupervised learning for efficient exploration in reinforcement learning with rich observations generated from a small number of latent states. We present a novel algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret reinforcement learning algorithm. We show that our algorithm provably finds a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of possible … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 32 publications
0
4
0
Order By: Relevance
“…Recently, many works have established that with additional assumptions, e.g. low-rankness of the transition, functions approximations for Q-functions, etc, the sample complexity does not depend on |S| [Li et al, 2011, Wen and Van Roy, 2017, Krishnamurthy et al, 2016, Jiang et al, 2017, Dann et al, 2018, Du et al, 2019b, Feng et al, 2020, Du et al, 2019c, Zhong et al, 2019, Jin et al, 2019, Du et al, 2019a, Roy and Dong, 2019, Lattimore and Szepesvari, 2019, Zanette et al, 2020. 5 However, to our knowledge, the sample complexity of all these work scales polynomially with H with the only exceptions to require the transition being deterministic [Wen andVan Roy, 2017, Du et al, 2020].…”
Section: Discussion and Further Open Problemsmentioning
confidence: 99%
“…Recently, many works have established that with additional assumptions, e.g. low-rankness of the transition, functions approximations for Q-functions, etc, the sample complexity does not depend on |S| [Li et al, 2011, Wen and Van Roy, 2017, Krishnamurthy et al, 2016, Jiang et al, 2017, Dann et al, 2018, Du et al, 2019b, Feng et al, 2020, Du et al, 2019c, Zhong et al, 2019, Jin et al, 2019, Du et al, 2019a, Roy and Dong, 2019, Lattimore and Szepesvari, 2019, Zanette et al, 2020. 5 However, to our knowledge, the sample complexity of all these work scales polynomially with H with the only exceptions to require the transition being deterministic [Wen andVan Roy, 2017, Du et al, 2020].…”
Section: Discussion and Further Open Problemsmentioning
confidence: 99%
“…VALOR (Dann et al, 2018), PCID (Du et al, 2019a), HOMER (Misra et al, 2020), RegRL , and the approach from Feng et al (2020) are algorithms for block MDPs which is a more restricted setting than low-rank MDPs. These works require additional assumptions such as deterministic transitions (Dann et al, 2018), reachability (Misra et al, 2020;Du et al, 2019a), strong Bellman completion , and strong unsupervised learning oracles (Feng et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…However, for real-world problems, the state space is often large, so we need to use function approximation. Developing provably efficient algorithms for large state space RL problems is a hot topic recently [Wen and Roy, 2013, Li et al, 2011, Du et al, 2019a, Krishnamurthy et al, 2016, Jiang et al, 2017, Dann et al, 2018, Du et al, 2019b, Sun et al, 2018, Du et al, 2019c, Feng et al, 2020,b, Jin et al, 2019, Zanette et al, 2019a, Wang et al, 2020b, Cai et al, 2019, Ayoub et al, 2020. These works are based on different assumptions.…”
Section: Related Workmentioning
confidence: 99%