2019
DOI: 10.48550/arxiv.1912.02503
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hindsight Credit Assignment

Abstract: We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the observed outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions can be rewritten through this lens, yielding a new family of algorithms. We study the properties of these algorithm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 4 publications
0
3
0
Order By: Relevance
“…The reward function is an important part of deep reinforcement learning, which determines the speed and degree of convergence of reinforcement learning algorithms. Reward shaping related work [ 39 , 40 ] has been studied to address the problem that agent does not recognize key actions and is not motivated to explore in more complex scenarios. Sparse rewards can lead to meaningful rewards that are not available most of the time during training, and it is difficult for the agent to learn in the direction of the goal without feedback.…”
Section: Materials and Methodsmentioning
confidence: 99%
“…The reward function is an important part of deep reinforcement learning, which determines the speed and degree of convergence of reinforcement learning algorithms. Reward shaping related work [ 39 , 40 ] has been studied to address the problem that agent does not recognize key actions and is not motivated to explore in more complex scenarios. Sparse rewards can lead to meaningful rewards that are not available most of the time during training, and it is difficult for the agent to learn in the direction of the goal without feedback.…”
Section: Materials and Methodsmentioning
confidence: 99%
“…[ [46][47][48] Rewards may be sparse, especially with delayed feedback, and the benefit of intermediate actions may not be immediately obvious.…”
Section: Adaptive Management Challenge Rl Methods and Concepts Citati...mentioning
confidence: 99%
“…Many works have studied better credit assignment via state-association, learning an architecture which decomposes the reward function such that certain "important" states comprise most of the credit [50,51,12]. They use the learned reward function to change the reward of an actor-critic algorithm to help propagate signal over long horizons.…”
Section: Credit Assignmentmentioning
confidence: 99%