2022
DOI: 10.48550/arxiv.2201.12403
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Planning and Learning with Adaptive Lookahead

Abstract: The classical Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. However, prior to running the algorithm, one cannot tell what is the best fixed lookahead horizon. Moreover, per a given run, using a lookahead of horizon larger than one is often wasteful. In this work, we propose for the first time … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…Instead, in finite action space environments such as Atari, we compute the exact expectation in SoftTreeMax with an exhaustive TS of depth d. Despite the exponential computational cost of spanning the entire tree, recent advancements in parallel GPU-based simulation allow efficient expansion of all nodes at the same depth simultaneously (Dalal et al, 2021;Rosenberg et al, 2022). This is possible when a simulator is implemented on GPU (Dalton et al, 2020;Makoviychuk et al, 2021;Freeman et al, 2021), or when a forward model is learned (Kim et al, 2020;Ha & Schmidhuber, 2018).…”
Section: Softtreemax: Deep Parallel Implementationmentioning
confidence: 99%
“…Instead, in finite action space environments such as Atari, we compute the exact expectation in SoftTreeMax with an exhaustive TS of depth d. Despite the exponential computational cost of spanning the entire tree, recent advancements in parallel GPU-based simulation allow efficient expansion of all nodes at the same depth simultaneously (Dalal et al, 2021;Rosenberg et al, 2022). This is possible when a simulator is implemented on GPU (Dalton et al, 2020;Makoviychuk et al, 2021;Freeman et al, 2021), or when a forward model is learned (Kim et al, 2020;Ha & Schmidhuber, 2018).…”
Section: Softtreemax: Deep Parallel Implementationmentioning
confidence: 99%