Human-in-the-Loop Imitation Learning using Remote Teleoperation

Mandlekar, Ajay; Xu, Danfei; Roberto, Martín-Martín,; Zhu, Yuke; Li, Feifei; Savarese, Silvio

doi:10.48550/arxiv.2012.06733

Cited by 11 publications

(33 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proxy Q value distribution shown in this section not only explains the avoidance behaviors, but also serves as a good indicator for the learned human preference. We benchmark the performance of two human-in-the-loop methods HG-DAgger (Kelly et al, 2019) and IWR (Mandlekar et al, 2020). Both methods require warming up through behavior cloning on a pre-collected dataset.…”

Section: Discussionmentioning

confidence: 99%

“…DAgger (Ross et al, 2011) and its extended methods (Kelly et al, 2019;Zhang & Cho, 2016;Hoque et al, 2021) correct the compounding error (Ross & Bagnell, 2010) of behavior cloning by periodically requesting expert to provide more demonstration. Instead of proving demonstration upon requests, Human-Gated DAgger (HG-DAgger) (Kelly et al, 2019), Expert Intervention Learning (EIL) (Spencer et al, 2020) and Intervention Weighted Regression (IWR) (Mandlekar et al, 2020) empower the expert to intervene exploration and carry the agent to safe states. However, these methods do not impose constraints to reduce human intervention and do not utilize the data from the free exploration of the agent.…”

Section: Related Workmentioning

confidence: 99%

“…Using this dataset, we evaluate passive IL method Behavior Cloning, active IL method GAIL (Ho & Ermon, 2016) and offline RL method CQL . We also run the Human-Gated DAgger (HG-DAgger) (Kelly et al, 2019) and Intervention Weighted Regression (IWR) (Mandlekar et al, 2020) as the baselines of human-in-the-loop methods based on this dataset and the human-AI copilot workflow.…”

Section: Baseline Comparisonmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Li¹,

Peng²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

Human intervention is an effective way to inject human knowledge into the loop of reinforcement learning, bringing fast learning and training safety. But given the very limited budget of human intervention, it is challenging to design when and how human expert interacts with the learning agent in the training. In this work, we develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO). To allow the agent's sufficient exploration in the risky environments while ensuring the training safety, the human expert can take over the control and demonstrate to the agent how to avoid probably dangerous situations or trivial behaviors. The proposed HACO then effectively utilizes the data collected both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent. HACO extracts proxy state-action values from partial human demonstration and optimizes the agent to improve the proxy values while reducing the human interventions. No environmental reward is required in HACO. The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. It can train agents to drive in unseen traffic scenes with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. Code and demo videos are available at: https://decisionforce.github.io/HACO/.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Baseline Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Li¹,

Peng²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…By incorporating human into the training, previous works successfully improve the performance on visual input control tasks such as Atari game [1,51,64]. Robotic control tasks also benefit from human feedback [28,39,58,46,65]. The other category is to have human in both training and test time to accurately accomplish humanassistive tasks.…”

Section: Related Workmentioning

confidence: 99%

Human-AI Shared Control via Policy Dissection

Li¹,

Peng²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Human-AI shared control allows human to interact and collaborate with AI to accomplish control tasks in complex environments. Previous Reinforcement Learning (RL) methods attempt the goal-conditioned design to achieve human-controllable policies at the cost of redesigning the reward function and training paradigm. Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called Policy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior. Without modifying the neural controller or retraining the model, the proposed approach can convert a given RL-trained policy into a human-interactive policy. We evaluate the proposed approach on the RL tasks of autonomous driving and locomotion. The experiments show that human-AI shared control achieved by Policy Dissection in driving task can substantially improve the performance and safety in unseen traffic scenes. With human in the loop, the locomotion robots also exhibit versatile controllable motion skills even though they are only trained to move forward. Our results suggest the promising direction of implementing human-AI shared autonomy through interpreting the learned representation of the autonomous agents. Demo video and code will be made available at https://metadriverse.github.io/policydissect.Preprint. Under review.

show abstract

“…are several works showing how humans can interactively teach robotic agents, for exampleSaxena et al (2014);Paxton et al (2017);Mandlekar et al (2018);Cabi et al (2019);Mandlekar et al (2020). InSaxena et al (2014), the authors demonstrate large-scale crowd-sourcing of data for perceptual and knowledge-base components of a robotics system.…”

mentioning

confidence: 99%

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

Sun¹,

Carlson²,

Qian³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers. The agent consists of a set of modules, some of which are learned, and others heuristic. While the agent is not "end-to-end" in the ML sense, end-toend interaction is a vital part of the agent's learning mechanism. We describe how the design of the agent works together with the design of multiple annotation interfaces to allow crowd-workers to assign credit to module errors from end-to-end interactions, and to label data for individual modules. Over multiple automated human-agent interaction, credit assignment, data annotation, and model re-training and re-deployment, rounds we demonstrate agent improvement.

show abstract

Human-in-the-Loop Imitation Learning using Remote Teleoperation

Cited by 11 publications

References 29 publications

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Human-AI Shared Control via Policy Dissection

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

Contact Info

Product

Resources

About