Pre-training with non-expert human demonstration for deep reinforcement learning

Cruz, Gabriel; Du, Yunshu; Taylor, Matthew E.

doi:10.1017/s0269888919000055

Cited by 25 publications

(11 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The policy can be learned through trial and error (RL) or from an expert's demonstration (IL). A major issue of RL is its sample inefficiency and human demonstration has been shown to speed up learning (Silver et al 2016;Hester et al 2018;de la Cruz, Du, and Taylor 2018;Zhang et al 2019).…”

Section: Introductionmentioning

confidence: 99%

Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

Zhang

Walshe

Liu

et al. 2020

AAAI

View full text Add to dashboard Cite

Large-scale public datasets have been shown to benefit research in multiple areas of modern artificial intelligence. For decision-making research that requires human data, high-quality datasets serve as important benchmarks to facilitate the development of new methods by providing a common reproducible standard. Many human decision-making tasks require visual attention to obtain high levels of performance. Therefore, measuring eye movements can provide a rich source of information about the strategies that humans use to solve decision-making tasks. Here, we provide a large-scale, high-quality dataset of human actions with simultaneously recorded eye movements while humans play Atari video games. The dataset consists of 117 hours of gameplay data from a diverse set of 20 games, with 8 million action demonstrations and 328 million gaze samples. We introduce a novel form of gameplay, in which the human plays in a semi-frame-by-frame manner. This leads to near-optimal game decisions and game scores that are comparable or better than known human records. We demonstrate the usefulness of the dataset through two simple applications: predicting human gaze and imitating human demonstrated actions. The quality of the data leads to promising results in both tasks. Moreover, using a learned human gaze model to inform imitation learning leads to an 115% increase in game performance. We interpret these results as highlighting the importance of incorporating human visual attention in models of decision making and demonstrating the value of the current dataset to the research community. We hope that the scale and quality of this dataset can provide more opportunities to researchers in the areas of visual attention, imitation learning, and reinforcement learning.

show abstract

Section: Introductionmentioning

confidence: 99%

Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

Zhang

Walshe

Liu

et al. 2020

AAAI

View full text Add to dashboard Cite

show abstract

“…We have also proposed a modification of the well-known A3C algorithm for its HAT and SoHAT variants and used it for experimental studies. This modification is an original contribution in itself and can play an important role in future studies of HAT where the learner and the expert’s separate performances might be of interest, precluding an existing alternative where the learner’s weights are pre-trained from the expert network (de la Cruz et al ., 2017).…”

Section: Discussionmentioning

confidence: 99%

“…Cruz et al . (2017) discuss a more direct way to implement HAT version of DQN by bootstrapping the network weights from a prior trained neural network. By contrast, we chose to keep the classifier separate from DQN’s value function network, to facilitate a fair comparison with SoHAT which requires repeated retraining on completed state-only expert demonstrations.…”

Section: Experimental Settingmentioning

confidence: 99%

Human–agent transfer from observations

Banerjee¹,

Racharla²

2020

The Knowledge Engineering Review

View full text Add to dashboard Cite

Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.

show abstract

“…Methods to reduce training time using methods such as pre-training [7,8], transfer learning [9], and learning from human demonstration [10] have also been developed recently. The crowd ensemble method can also be used with these methods for further reduction of training time.…”

Section: Recent Advancements In Reinforcement Learningmentioning

confidence: 99%