2018 IEEE 30th International Conference on Tools With Artificial Intelligence (ICTAI) 2018
DOI: 10.1109/ictai.2018.00015
|View full text |Cite
|
Sign up to set email alerts
|

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

Abstract: Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of the next state at any point in time. Subsequently, the consistency of this prediction with the current value functio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 4 publications
(17 reference statements)
0
4
0
Order By: Relevance
“…ings (Carvalho et al, 2019;Rogers et al, 2020;Braşoveanu and Andonie, 2020). Such studies range from, for example, observing attention weights (Clark et al, 2019;Kovaleva et al, 2019;Reif et al, 2019;Lin et al, 2019;Mareček and Rosa, 2019;Htut et al, 2019;Raganato and Tiedemann, 2018), gradients (Brunner et al, 2020), and value-weighted vector norms (Kobayashi et al, 2020). The analysis scope has been further extended from attention only to including RES1 (Abnar and Zuidema, 2020), RES1 and LN1 (Kobayashi et al, 2021), and RES1, LN1, and LN2 (Modarressi et al, 2022).…”
Section: All the Scopesmentioning
confidence: 99%
“…ings (Carvalho et al, 2019;Rogers et al, 2020;Braşoveanu and Andonie, 2020). Such studies range from, for example, observing attention weights (Clark et al, 2019;Kovaleva et al, 2019;Reif et al, 2019;Lin et al, 2019;Mareček and Rosa, 2019;Htut et al, 2019;Raganato and Tiedemann, 2018), gradients (Brunner et al, 2020), and value-weighted vector norms (Kobayashi et al, 2020). The analysis scope has been further extended from attention only to including RES1 (Abnar and Zuidema, 2020), RES1 and LN1 (Kobayashi et al, 2021), and RES1, LN1, and LN2 (Modarressi et al, 2022).…”
Section: All the Scopesmentioning
confidence: 99%
“…Besides, intrinsic rewards can be modeled based on comparisons between the current observation and the past episodic memories [54]. Moreover, the differences between the actual and predicted consequences can also be regarded as a measure of surprise [55], [56]. Generally, the latter dynamicbased rewards are straightforward to scale and parallelize [57].…”
Section: Related Workmentioning
confidence: 99%
“…This can enable faster training, since it can preempt the need for performing expensive simulations of the environment. Predicting latent representation was also proposed in [Brunner et al, 2018] as a regularization method for reinforcement learning.…”
Section: Related Workmentioning
confidence: 99%