2019
DOI: 10.1007/978-3-030-21803-4_72
|View full text |Cite
|
Sign up to set email alerts
|

A Gray-Box Approach for Curriculum Learning

Abstract: Curriculum learning is often employed in deep reinforcement learning to let the agent progress more quickly towards better behaviors. Numerical methods for curriculum learning in the literature provides only initial heuristic solutions, with little to no guarantee on their quality. We define a new gray-box function that, including a suitable scheduling problem, can be effectively used to reformulate the curriculum learning problem. We propose different efficient numerical methods to address this gray-box refor… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 11 publications
(19 reference statements)
0
1
0
Order By: Relevance
“…Finally, instead of treating the entire problem as a black box, it has also treated it as a gray box. Foglino et al (2019c) propose such an approach, formulating the optimization problem as the composition of a white box scheduling problem and black box parameter optimization. The scheduling formulation partially models the effects of a given sequence, assigning a utility to each task, and a penalty to each pair of tasks, which captures the effect on the objective of learning two tasks one after the other.…”
Section: Combinatorial Optimization and Searchmentioning
confidence: 99%
“…Finally, instead of treating the entire problem as a black box, it has also treated it as a gray box. Foglino et al (2019c) propose such an approach, formulating the optimization problem as the composition of a white box scheduling problem and black box parameter optimization. The scheduling formulation partially models the effects of a given sequence, assigning a utility to each task, and a penalty to each pair of tasks, which captures the effect on the objective of learning two tasks one after the other.…”
Section: Combinatorial Optimization and Searchmentioning
confidence: 99%