2022
DOI: 10.48550/arxiv.2201.01666
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

Abstract: In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. As this noise is heteroscedastic, its effects can be mitigated using uncertainty-based weights in the optimization process. Previous methods rely on sampled ensembles, which do not capture all aspects of uncertainty. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL, and introduce i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…To test the proposed algorithm in a more realistic setting, we also run our algorithm on an autonomous vehicle driving simulator [29] in a highway domain, where rewards are designed to penalize unsafe driving behavior. The sparsity and risk-sensitivity of rewards in these tasks make Baselines: We compare the performance of UUaE to three recently proposed algorithms: SUNRISE [9], DLTV [21], and IV-DQN [24], which respectively act based on the epistemic uncertainty, aleatory uncertainty, and additive formulation of epistemic and aleatory uncertainty. Setup: We implement the baselines using their original implementations.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To test the proposed algorithm in a more realistic setting, we also run our algorithm on an autonomous vehicle driving simulator [29] in a highway domain, where rewards are designed to penalize unsafe driving behavior. The sparsity and risk-sensitivity of rewards in these tasks make Baselines: We compare the performance of UUaE to three recently proposed algorithms: SUNRISE [9], DLTV [21], and IV-DQN [24], which respectively act based on the epistemic uncertainty, aleatory uncertainty, and additive formulation of epistemic and aleatory uncertainty. Setup: We implement the baselines using their original implementations.…”
Section: Methodsmentioning
confidence: 99%
“…Mavrin et al [21] proposed a method to suppress the effect of aleatory uncertainty by applying a decay schedule, but this method does not consider epistemic uncertainty. Some other works [22,23,24] estimate epistemic and aleatory uncertainty separately and combine them using a weighted sum. However, due to the reducibility of epistemic uncertainty and non-reducibility of aleatory uncertainty during training, this additive formulation underestimates the integrated effect of epistemic uncertainty.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Building upon many of the above techniques and approaches from supervised learning, (Mai, Mani, and Paull 2022) present a method for "inverse-variance" reinforcement learning with decoupled uncertainty estimates. Specifically, they modify the value function loss over a minibatch to be L = L BIV + λL LA .…”
Section: Combinations and Other Approachesmentioning
confidence: 99%
“…The concrete algorithm used here might be what is called the delta rule; see Korenberg and Ghahramani (2002) as an example of a related if slightly more complex model. Mai et al (2022) propose a closely related weighting in the context of reinforcement learning and RPE. See also footnote 15 in Chapter 5.…”
Section: Reducing Certainty Attributed To Perception and Conceptsmentioning
confidence: 99%