Learning to Control in Metric Space with Optimal Regret

Yang, Lin F.; Ni, Chengzhuo; Wang, Mengdi

doi:10.48550/arxiv.1905.01576

Cited by 6 publications

(2 citation statements)

References 18 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They achieve a regret bound of K d+1 d+2 . Yang et al (2019) consider a deterministic control system under a Lipschitz assumption of the optimal action-value functions and the transition function and they establish a regret of…”

Section: Related Workmentioning

confidence: 99%

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Touati¹,

Taïga²,

Bellemare³

2020

Preprint

View full text Add to dashboard Cite

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZOOMRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitationexploration trade-off. We show that ZOOMRL achieves a worst-case regret O(H) where H is the planning horizon, K is the number of episodes and d is the covering dimension of the space with respect to the metric. Moreover, our algorithm enjoys improved metric-dependent guarantees that reflect the geometry of the underlying space. Finally, we show that our algorithm is robust to small misspecification errors.

show abstract

Section: Related Workmentioning

confidence: 99%

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Touati¹,

Taïga²,

Bellemare³

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…A common strategy is to use UCB bonus to encourage exploration in less-visited states and actions. One can also study RL in metric spaces (Pazis and Parr, 2013;Song and Sun, 2019;Yang et al, 2019a). However, in general, this type of algorithms has an exponential dependence on the state dimension.…”

Section: Related Workmentioning

confidence: 99%

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Feng,

Wang,

Yin

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

We study how to use unsupervised learning for efficient exploration in reinforcement learning with rich observations generated from a small number of latent states. We present a novel algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret reinforcement learning algorithm. We show that our algorithm provably finds a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of possible observations. Our result gives theoretical justification to the prevailing paradigm of using unsupervised learning for efficient exploration (Tang et al., 2017;Bellemare et al., 2016).

show abstract

Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space

Long¹,

Han²

2022

JML

View full text Add to dashboard Cite

Most existing theoretical analysis of reinforcement learning (RL) is limited to the tabular setting or linear models due to the difficulty in dealing with function approximation in high dimensional space with an uncertain environment. This work offers a fresh perspective into this challenge by analyzing RL in a general reproducing kernel Hilbert space (RKHS). We consider a family of Markov decision processes M of which the reward functions lie in the unit ball of an RKHS and transition probabilities lie in a given arbitrary set. We define a quantity called perturbational complexity by distribution mismatch ∆ M (ǫ) to characterize the complexity of the admissible state-action distribution space in response to a perturbation in the RKHS with scale ǫ. We show that ∆ M (ǫ) gives both the lower bound of the error of all possible algorithms and the upper bound of two specific algorithms (fitted reward and fitted Q-iteration) for the RL problem. Hence, the decay of ∆ M (ǫ) with respect to ǫ measures the difficulty of the RL problem on M. We further provide some concrete examples and discuss whether ∆ M (ǫ) decays fast or not in these examples. As a byproduct, we show that when the reward functions lie in a high dimensional RKHS, even if the transition probability is known and the action space is finite, it is still possible for RL problems to suffer from the curse of dimensionality.

show abstract

Learning to Control in Metric Space with Optimal Regret

Cited by 6 publications

References 18 publications

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space

Contact Info

Product

Resources

About