2021
DOI: 10.1109/access.2021.3094566
|View full text |Cite
|
Sign up to set email alerts
|

A Functional Clipping Approach for Policy Optimization Algorithms

Abstract: Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each policy update. Although such restriction is helpful, the algorithm still suffers from performance instability and optimization inefficiency from the sudden flattening of the curve. To address this issue we present a novel functional clipping policy optimization algorithm, named… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 11 publications
(14 reference statements)
0
7
0
Order By: Relevance
“…Since their performances generally decreased with stronger regularization methods, these results are probably due to too strong regularization of π to b for the replayed data. As pointed out in (Wang et al, 2020;Zhu and Rosendo, 2021), PPO has no capability to softly constrain π to b even with the too small threshold, and therefore, it yielded the near-optimal policy in the tasks except Swingup. In contrast to them, only PPO-RPE-A stably learned all the tasks.…”
Section: Results For Simple Tasksmentioning
confidence: 96%
See 4 more Smart Citations
“…Since their performances generally decreased with stronger regularization methods, these results are probably due to too strong regularization of π to b for the replayed data. As pointed out in (Wang et al, 2020;Zhu and Rosendo, 2021), PPO has no capability to softly constrain π to b even with the too small threshold, and therefore, it yielded the near-optimal policy in the tasks except Swingup. In contrast to them, only PPO-RPE-A stably learned all the tasks.…”
Section: Results For Simple Tasksmentioning
confidence: 96%
“…{e,r}PPO-RB (Wang et al, 2020): η = 0.3 as the recommended value 3. {e,r}PPOS (Zhu and Rosendo, 2021): η = 0.3 as the recommended value 4. {e,r}PPO-RPE (Kobayashi, 2021a):…”
Section: Results For Simple Tasksmentioning
confidence: 99%
See 3 more Smart Citations