2018
DOI: 10.1007/s10846-018-0839-z
|View full text |Cite
|
Sign up to set email alerts
|

An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 37 publications
(61 citation statements)
references
References 38 publications
0
58
0
Order By: Relevance
“…Corrective feedback has been used in Argall et al (2008Argall et al ( , 2011, wherein policies for continuous action problems are learned from human corrective advice; this kind of feedback also showed to be faster than critic-only RL algorithms for the reported experiments, even when the users were non-experts Celemin andRuiz-del Solar (2015, 2018).…”
Section: Background and Related Workmentioning
confidence: 90%
See 1 more Smart Citation
“…Corrective feedback has been used in Argall et al (2008Argall et al ( , 2011, wherein policies for continuous action problems are learned from human corrective advice; this kind of feedback also showed to be faster than critic-only RL algorithms for the reported experiments, even when the users were non-experts Celemin andRuiz-del Solar (2015, 2018).…”
Section: Background and Related Workmentioning
confidence: 90%
“…Corrective feedback advised by human teachers is used in the introduced approach, similarly to the mentioned hybrid learning systems based on RL and human reinforcements. In the proposed approach, human knowledge is provided to the PS learning agents with corrective advice using the COACH algorithm (Celemin and Ruiz-del Solar 2015), which has outperformed some pure autonomous RL agents and pure interactive learning agents based on human reinforcements, and demonstrated to be useful in some continuous actions problems such as the balancing of cart-pole problem, bike balancing, and also navigation for humanoid robots (Celemin and Ruiz-del Solar 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Unlike DRL, where the policy is updated with information collected from every time step, in COACH-like methods there only is new data to update the policy when feedback is given by the teacher, so the amount of data used to update the policy may be lower than in the RL case. Since the original COACH has been widely validated with real human teachers in several tasks, we carried out most of the comparisons using a simulated teacher (a high performance policy standing-in as teacher, which was actually trained with D-COACH and a real human teacher) in this work, like in some of the experiments presented in [6], in order to compare the methods under more controlled conditions. The simulated teacher generates feedback using h = sign(a teacher −a agent ), whereas the decision of advising feedback at each time step is given by the probability P h = α · exp(−τ · timestep), where {α ∈ IR |0 ≤ α ≤ 1} and {τ ∈ IR |0 ≤ τ}.…”
Section: Validation Of Replay Buffer With Simulated Teachersmentioning
confidence: 99%
“…We combine Deep Learning (DL) with the corrective advice based learning framework called COrrective Advice Communicated by Humans (COACH) [6], thus creating the Deep COACH (D-COACH) framework. In this approach, no reward functions are needed and the amount of learning episodes is significantly reduced in comparison to alternative approaches.…”
Section: Introductionmentioning
confidence: 99%
“…In this framework no value function is modeled, since no reward/cost is used in the learning process [9]. A parametrized policy is directly learned in the parameter space, as in Policy Search (PS) RL.…”
Section: Coachmentioning
confidence: 99%