An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback

Celemin, Carlos; Ruiz‐del‐Solar, Javier

doi:10.1007/s10846-018-0839-z

Cited by 37 publications

(61 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Corrective feedback has been used in Argall et al (2008Argall et al ( , 2011, wherein policies for continuous action problems are learned from human corrective advice; this kind of feedback also showed to be faster than critic-only RL algorithms for the reported experiments, even when the users were non-experts Celemin andRuiz-del Solar (2015, 2018).…”

Section: Background and Related Workmentioning

confidence: 90%

“…Corrective feedback advised by human teachers is used in the introduced approach, similarly to the mentioned hybrid learning systems based on RL and human reinforcements. In the proposed approach, human knowledge is provided to the PS learning agents with corrective advice using the COACH algorithm (Celemin and Ruiz-del Solar 2015), which has outperformed some pure autonomous RL agents and pure interactive learning agents based on human reinforcements, and demonstrated to be useful in some continuous actions problems such as the balancing of cart-pole problem, bike balancing, and also navigation for humanoid robots (Celemin and Ruiz-del Solar 2018).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A fast hybrid reinforcement learning framework with human corrective feedback

2018

Self Cite

View full text Add to dashboard Cite

Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

show abstract

Section: Background and Related Workmentioning

confidence: 90%

Section: Introductionmentioning

confidence: 99%

A fast hybrid reinforcement learning framework with human corrective feedback

2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…Unlike DRL, where the policy is updated with information collected from every time step, in COACH-like methods there only is new data to update the policy when feedback is given by the teacher, so the amount of data used to update the policy may be lower than in the RL case. Since the original COACH has been widely validated with real human teachers in several tasks, we carried out most of the comparisons using a simulated teacher (a high performance policy standing-in as teacher, which was actually trained with D-COACH and a real human teacher) in this work, like in some of the experiments presented in [6], in order to compare the methods under more controlled conditions. The simulated teacher generates feedback using h = sign(a teacher −a agent ), whereas the decision of advising feedback at each time step is given by the probability P h = α · exp(−τ · timestep), where {α ∈ IR |0 ≤ α ≤ 1} and {τ ∈ IR |0 ≤ τ}.…”

Section: Validation Of Replay Buffer With Simulated Teachersmentioning

confidence: 99%

“…We combine Deep Learning (DL) with the corrective advice based learning framework called COrrective Advice Communicated by Humans (COACH) [6], thus creating the Deep COACH (D-COACH) framework. In this approach, no reward functions are needed and the amount of learning episodes is significantly reduced in comparison to alternative approaches.…”

Section: Introductionmentioning

confidence: 99%

Interactive Learning with Corrective Feedback for Policies Based on Deep Neural Networks

Pérez-Dattari

Celemin

Ruiz‐del‐Solar

et al. 2020

Springer Proceedings in Advanced Robotics

Self Cite

View full text Add to dashboard Cite

Deep Reinforcement Learning (DRL) has become a powerful strategy to solve complex decision making problems based on Deep Neural Networks (DNNs). However, it is highly data demanding, so unfeasible in physical systems for most applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback, with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teachers as well as the power of DNNs, but also has no need of a reward function (which sometimes implies the need of external perception for computing rewards). We combine Deep Learning with the COrrective Advice Communicated by Humans (COACH) framework, in which non-expert humans shape policies by correcting the agent's actions during execution. The D-COACH framework has the potential to solve complex problems without much data or time required. Experimental results validated the efficiency of the framework in three different problems (two simulated, one with a real robot), with state spaces of low and high dimensions, showing the capacity to successfully learn policies for continuous action spaces like in the Car Racing and Cart-Pole problems faster than with DRL.

show abstract

“…In this framework no value function is modeled, since no reward/cost is used in the learning process [9]. A parametrized policy is directly learned in the parameter space, as in Policy Search (PS) RL.…”

Section: Coachmentioning

confidence: 99%

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Pérez-Dattari

Celemin

Ruiz‐del‐Solar

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function.In this context, we analyze the use of human corrective feedback during task execution to learn policies with highdimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of D-COACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.1 Rodrigo Pérez-Dattari and Javier Ruiz-del-Solar are with the Electrical Engineering Department and the AMTC,

show abstract

An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback

Cited by 37 publications

References 38 publications

A fast hybrid reinforcement learning framework with human corrective feedback

A fast hybrid reinforcement learning framework with human corrective feedback

Interactive Learning with Corrective Feedback for Policies Based on Deep Neural Networks

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Contact Info

Product

Resources

About