Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence 2017
DOI: 10.24963/ijcai.2017/656
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning with a Corrupted Reward Channel

Abstract: No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
41
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 53 publications
(42 citation statements)
references
References 6 publications
1
41
0
Order By: Relevance
“…An AGI may be tempted to influence the data training its reward function so it points towards simple-to-optimize reward functions rather than harder ones (Armstrong, 2015). Everitt, Krakovna, et al (2017) show that the type of data the agent receives matter for reward learning corruption. In particular, if the reward data can be cross-checked between multiple sources, then the reward corruption incentive diminishes drastically.…”
Section: Value Specificationmentioning
confidence: 97%
“…An AGI may be tempted to influence the data training its reward function so it points towards simple-to-optimize reward functions rather than harder ones (Armstrong, 2015). Everitt, Krakovna, et al (2017) show that the type of data the agent receives matter for reward learning corruption. In particular, if the reward data can be cross-checked between multiple sources, then the reward corruption incentive diminishes drastically.…”
Section: Value Specificationmentioning
confidence: 97%
“…Commonsense failure goals occur when an intelligent agent does not achieve the desired result because part of the goal, or the way the goal should have been achieved, is left unstated (this is also referred to as a corrupted goal or corrupted reward; Everitt, Krakovna, Orseau, Hutter, & Legg, 2017). Why would this happen?…”
Section: Understanding Humansmentioning
confidence: 99%
“…Policy makers may want to address certain safety risks which are directly linked to reinforcement learning. This includes, among others, reward hacking [45,46] and interruptibility [47]. The following definition could be used:…”
Section: Designmentioning
confidence: 99%