2018
DOI: 10.1109/tnnls.2018.2790981
|View full text |Cite
|
Sign up to set email alerts
|

Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning

Abstract: In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 102 publications
(54 citation statements)
references
References 36 publications
(48 reference statements)
0
47
0
Order By: Relevance
“…This mapping results from sampling the training set according to the probabilities at the current epoch p (e) . Minibatches are then formed from {X , Y} c and the probabilities are decayed towards a uniform distribution[2], based on the following function[12]: exp(−cn 2 i /10) ∀e > 0,…”
mentioning
confidence: 99%
“…This mapping results from sampling the training set according to the probabilities at the current epoch p (e) . Minibatches are then formed from {X , Y} c and the probabilities are decayed towards a uniform distribution[2], based on the following function[12]: exp(−cn 2 i /10) ∀e > 0,…”
mentioning
confidence: 99%
“…An RL agent learns a map between the environment state space and the action space through its interaction with the environment including observing the system's state, selecting and executing actions, and getting numerical action reward [23]. e mathematical theoretical basis of RL is discrete-time finite-state MDPs [24]. In a general way, a five-element tuple…”
Section: Reinforcementmentioning
confidence: 99%
“…Schaul et al [39] introduces Prioritize Experience Replay (PER) which priorities important experiences to be sampled from the replay buffer to generate the examples following curriculum learning scheme. Subsequently, Ren et al [40] combines self-paced prioritize function and coverage penalty function which could select samples with appropriate difficulty with penalty when samples are replayed frequently. Another studies, such as [41] and [42] use curriculum learning to schedule ordered list of task and maps to be solved by the RL agent.…”
Section: Curriculum Learningmentioning
confidence: 99%