2021
DOI: 10.1115/1.0001814v
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reliability-Based Reinforcement Learning Under Uncertainty

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…• The above definition of robust Q-values reduces to known values of Q-values defined in sa-rectangular R-contamination uncertainty set in [25] and sa-rectangular L p constrained uncertainty set in [12,4].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…• The above definition of robust Q-values reduces to known values of Q-values defined in sa-rectangular R-contamination uncertainty set in [25] and sa-rectangular L p constrained uncertainty set in [12,4].…”
Section: Discussionmentioning
confidence: 99%
“…Without some structural assumptions on the uncertainty set, solving robust MDPs can be NP-hard [28]. Therefore, to preserve tractability, we often assume that the uncertainty set is convex and s-rectangular, that is , it can be expressed as a Cartesian product over states [18,7,28,4,12,25]. In that case, standard solvers for MDPs carry over to robust MDPs.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Ghugare et al (2022) optimize an objective for learning a latent-space model and policy jointly, aiming to maximize a lower bound on the expected on the overall RL objective. An et al (2021) propose uncertainty-based methods to guide the Q-value function update using the data with high confidence. Others address this problem by imitating experts (Zolna et al , 2020) or learning ensembles (Agarwal et al , 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Offline RL, as known as batch RL, plays an appealing alternative role (An et al , 2021; Fujimoto and Gu, 2021; Fujimoto et al , 2019). In direct contrast to online RL, offline RL acquires effective policies by using prior collected large-scale data, without online interaction during training.…”
Section: Introductionmentioning
confidence: 99%