2019
DOI: 10.48550/arxiv.1905.01576
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Control in Metric Space with Optimal Regret

Abstract: We study online reinforcement learning for finite-horizon deterministic control systems with arbitrary state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 18 publications
(21 reference statements)
0
2
0
Order By: Relevance
“…They achieve a regret bound of K d+1 d+2 . Yang et al (2019) consider a deterministic control system under a Lipschitz assumption of the optimal action-value functions and the transition function and they establish a regret of…”
Section: Related Workmentioning
confidence: 99%
“…They achieve a regret bound of K d+1 d+2 . Yang et al (2019) consider a deterministic control system under a Lipschitz assumption of the optimal action-value functions and the transition function and they establish a regret of…”
Section: Related Workmentioning
confidence: 99%
“…A common strategy is to use UCB bonus to encourage exploration in less-visited states and actions. One can also study RL in metric spaces (Pazis and Parr, 2013;Song and Sun, 2019;Yang et al, 2019a). However, in general, this type of algorithms has an exponential dependence on the state dimension.…”
Section: Related Workmentioning
confidence: 99%