2019
DOI: 10.1017/s0269888919000055
|View full text |Cite
|
Sign up to set email alerts
|

Pre-training with non-expert human demonstration for deep reinforcement learning

Abstract: Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(11 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…The policy can be learned through trial and error (RL) or from an expert's demonstration (IL). A major issue of RL is its sample inefficiency and human demonstration has been shown to speed up learning (Silver et al 2016;Hester et al 2018;de la Cruz, Du, and Taylor 2018;Zhang et al 2019).…”
Section: Introductionmentioning
confidence: 99%
“…The policy can be learned through trial and error (RL) or from an expert's demonstration (IL). A major issue of RL is its sample inefficiency and human demonstration has been shown to speed up learning (Silver et al 2016;Hester et al 2018;de la Cruz, Du, and Taylor 2018;Zhang et al 2019).…”
Section: Introductionmentioning
confidence: 99%
“…We have also proposed a modification of the well-known A3C algorithm for its HAT and SoHAT variants and used it for experimental studies. This modification is an original contribution in itself and can play an important role in future studies of HAT where the learner and the expert’s separate performances might be of interest, precluding an existing alternative where the learner’s weights are pre-trained from the expert network (de la Cruz et al ., 2017).…”
Section: Discussionmentioning
confidence: 99%
“…Cruz et al . (2017) discuss a more direct way to implement HAT version of DQN by bootstrapping the network weights from a prior trained neural network. By contrast, we chose to keep the classifier separate from DQN’s value function network, to facilitate a fair comparison with SoHAT which requires repeated retraining on completed state-only expert demonstrations.…”
Section: Experimental Settingmentioning
confidence: 99%
“…Methods to reduce training time using methods such as pre-training [7,8], transfer learning [9], and learning from human demonstration [10] have also been developed recently. The crowd ensemble method can also be used with these methods for further reduction of training time.…”
Section: Recent Advancements In Reinforcement Learningmentioning
confidence: 99%