2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2019
DOI: 10.1109/allerton.2019.8919864
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Control in Metric Space with Optimal Regret

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…While the optimal lower bound of the error of RL algorithms in the tabular setting has been established in [2,4], there are much fewer results about lower bounds of RL with function approximation. [26] proves an optimal lower bound for Lipschitz function approximation. [27] shows that even when the value function, policy function, reward function, and transition probability can be approximated by a linear function, it is still possible that solving the RL problem requires samples exponentially depending on the horizon.…”
Section: Our Contributionsmentioning
confidence: 99%
“…While the optimal lower bound of the error of RL algorithms in the tabular setting has been established in [2,4], there are much fewer results about lower bounds of RL with function approximation. [26] proves an optimal lower bound for Lipschitz function approximation. [27] shows that even when the value function, policy function, reward function, and transition probability can be approximated by a linear function, it is still possible that solving the RL problem requires samples exponentially depending on the horizon.…”
Section: Our Contributionsmentioning
confidence: 99%
“…In more structured environments, an agent can frequently take advantage of metric-based learning (Kakade, Kearns, and Langford 2003). Recent theoretical results have bounded cumulative regret by assuming Lipschitz continuity of either the optimal Q-function or the transition function, in both deterministic (Ni, Yang, and Wang 2019) and stochastic (Pazis and Parr 2013;Touati, Taiga, and Bellemare 2020) domains. However, due to their restrictive assumptions these methods may be computationally impractical to apply as the dimensionality of the learning domain increases.…”
Section: Related Workmentioning
confidence: 99%
“…As stated earlier, we would like knownness to quantify similarity of the state-action to previously observed state-actions. We assume X is endowed with a metric, d(x 1 , x 2 ), that quantifies such similarity, as is common in many works in continuous RL (Ni, Yang, and Wang 2019;Asadi, Misra, and Littman 2018). We define knownness as a function of the distance to the closest state-action which the agent has observed:…”
Section: Optimistic Initialization In Continuous Mdpsmentioning
confidence: 99%
“…Related Literature While the optimal lower bound of the error of RL algorithms in the tabular setting has been established in [3,4], there are much fewer results about lower bounds of RL with function approximation. [31] proves an optimal lower bound for Lipschitz function approximation. [14] shows that even when the value function, policy function, reward function, and transition probability can be approximated by a linear function, it is still possible that solving the RL problem requires samples exponentially depending on the horizon.…”
Section: Our Contributionmentioning
confidence: 99%