2021
DOI: 10.48550/arxiv.2110.03032
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Multi-Objective Curricula for Deep Reinforcement Learning

Abstract: Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward shaping, environment generation, or initial state generation. However, prior work only considers curriculum learning following one of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…Using the Q-Censoring strategy, learning is simplified and allows for a higher reward in time with little effort. The Non-Censoring Noisy Network version can be seen as a curricular learning like in [39], where it is assimilated first through the boundaries of the map, and later, the entropy reduction policy is optimized. The censored strategy alleviates this condition, especially in longer episodes, where the possibility of collision is greater.…”
Section: Learning Resultsmentioning
confidence: 99%
“…Using the Q-Censoring strategy, learning is simplified and allows for a higher reward in time with little effort. The Non-Censoring Noisy Network version can be seen as a curricular learning like in [39], where it is assimilated first through the boundaries of the map, and later, the entropy reduction policy is optimized. The censored strategy alleviates this condition, especially in longer episodes, where the possibility of collision is greater.…”
Section: Learning Resultsmentioning
confidence: 99%
“…ular learning like in [35], where it is assimilated first through the boundaries of the map, and later, the entropy reduction policy is optimized. The censored strategy alleviates this condition, especially in longer episodes, where the possibility of collision is greater.…”
Section: Learning Resultsmentioning
confidence: 99%
“…The work [Lv et al 2017] proposes training tricks such as random scaling to improve the generalization. Besides the above, some work [Knyazev et al 2021][Kang et al 2021 proposes to directly predict parameters, which can be seen as a specially learned optimizer without any gradient update.…”
Section: Optimizermentioning
confidence: 99%