2019
DOI: 10.48550/arxiv.1902.04257
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Reinforcement Learning from Policy-Dependent Human Feedback

Abstract: To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent's current behavior rather than as an alternative reward signal to be maximized, culminating i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 23 publications
(37 citation statements)
references
References 12 publications
0
34
0
Order By: Relevance
“…Learning from human feedback. Several works have successfully utilized feedback from real humans to train agents where it is assumed that the feedback is available at all times (Pilarski et al, 2011;MacGlashan et al, 2017;Arumugam et al, 2019). Due to this high feedback frequency, these approaches are difficult to scale to more complex learning problems that require substantial agent experience.…”
Section: Related Workmentioning
confidence: 99%
“…Learning from human feedback. Several works have successfully utilized feedback from real humans to train agents where it is assumed that the feedback is available at all times (Pilarski et al, 2011;MacGlashan et al, 2017;Arumugam et al, 2019). Due to this high feedback frequency, these approaches are difficult to scale to more complex learning problems that require substantial agent experience.…”
Section: Related Workmentioning
confidence: 99%
“…Low-level manipulation Surgical assistance [49,68] Vehicle manipulation [80] Robotic arm [61] VR teleoperation [79] High-level tasks 2D gameplay [59] 3D gameplay [4] Navigation [28] Sports analysis [78] evaluated on 3D games like GTAV or Minecraft for evaluation. This taxonomy could be meaningful since it clearly reflects the target domain of the proposed algorithm, as the variance on their evaluation methods could be smaller, this may help to design a unified evaluation metric for IL.…”
Section: Classes Examples and Publicationsmentioning
confidence: 99%
“…This insight motivated the design of the Convergent Actor-Critic by Humans (COACH) algorithm, which treats human feedback as an advantage signal. Subsequent work has also successfully applied COACH to training deep learning agents [108].…”
Section: Applying Human Inverse Models To the Design Of Autonomous Sy...mentioning
confidence: 99%