2023
DOI: 10.1016/j.robot.2023.104383
|View full text |Cite
|
Sign up to set email alerts
|

Iterative reward shaping for non-overshooting altitude control of a wing-in-ground craft based on deep reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…Based on zeroth-order optimization techniques, it uses multiple system trajectories to estimate the policy gradient. There has been a resurgent interest in studying theoretical properties of PO on the LQR problem such as convergence and sample complexity; see e.g., [4]- [7] and the comprehensive survey [8]. Even though global convergence has been shown for the nonconvex PO Research of Feiran Zhao and Keyou You was supported by National Natural Science Foundation of China under Grant no.…”
Section: Introductionmentioning
confidence: 99%
“…Based on zeroth-order optimization techniques, it uses multiple system trajectories to estimate the policy gradient. There has been a resurgent interest in studying theoretical properties of PO on the LQR problem such as convergence and sample complexity; see e.g., [4]- [7] and the comprehensive survey [8]. Even though global convergence has been shown for the nonconvex PO Research of Feiran Zhao and Keyou You was supported by National Natural Science Foundation of China under Grant no.…”
Section: Introductionmentioning
confidence: 99%